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COMPARATIVE GENE TRANSCRIPT ANALYSIS 



1. FIELD OF INVENTION 

The present invention is in the field of molecular 
biology and computer science; more particularly, the 
5 present invention describes methods of analyzing gene 

transcripts and diagnosing the genetic expression of cells 
and tissue. 

2. BACKGROUND OF THE INVENTION 

Until very recently, the history of molecular biology 
10 has been written one gene at a time. Scientists have 
observed the cell's physical changes, isolated mixtures 
from the cell or its milieu, purified proteins, sequenced 
proteins and therefrom constructed probes to look for the 
. corresponding gene. 
15 Recently, different nations have set up massive 

projects to sequence the billions of bases in the human 
genome. These projects typically begin with dividing the 
genome into large portions of chromosomes and then 
determining the sequences of these pieces, which are then 
20 analyzed for identity with known proteins or portions 

thereof, known as motifs. Unfortunately, the majority of 
genomic DNA does not encode proteins and though it is 
postulated to have some effect on the cell's ability to 
make protein, its relevance to medical applications is not 

25 understood at this time. 

A third methodology involves sequencing only the 
transcripts encoding the cellular machinery actively 
involved in making protein, namely the mRNA. The advantage 
is that the cell has already edited out all the non-coding 

30 DNA, and it is relatively easy to identify the protein- 
coding portion of the RNA. The utility of this approach 
was not immediately obvious to genomic researchers. In 
fact, when cDNA sequencing was initially proposed, the 
method was roundly denounced by those committed to genomic 

35 sequencing. For example, the head of the U.S. Human Genome 
project discounted CDNA sequencing as not valuable and 
refused to approve funding of projects. 

In this disclosure, we teach methods for analyzing 
DNA, including cDNA libraries. Based on our analyses and 
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research, we see each individual gene product as a "pixel" 
of information, which relates to the expression of that, 
and only that, gene. We teach herein, methods whereby the 
individual "pixels" of gene expression information can be 
5 combined into a single gene transcript "image," in which 
each of the individual genes can be visualized 
simultaneously and allowing relationships between the gene 
pixels to be easily visualized and understood. 

We further teach a new methcd which we call electronic 

10 subtraction. Electronic subtraction will enable the gene 
researcher to turn a single image into a moving picture, 
one which describes the temporality or dynamics of gene 
expression, at the level of a cell or a whole tissue. It 
is that sense of "motion" of cellular machinery on the 

15 scale of a cell or organ which constitutes the new 

invention herein. This constitutes a new view into the 
process of living cell physiology and one which holds great 
promise to unveil and discover new therapeutic and 
diagnostic approaches in medicine. 

20 We teach another method which we call "electronic 

northern," which tracks the expression of a single gene 
across many types of cells and tissues. 

Nucleic acids (DNA and RNA) carry within their 
sequence the hereditary information and are therefore the 

25 prime molecules of life. Nucleic acids are found in all 

living organisms including bacteria, fungi, viruses, plants 
and animals. It is of interest to determine the relative 
abundance of different discrete nucleic acids in different 
cells, tissues and organisms over time under various 

30 conditions, treatments and regimes. 

All dividing cells in the human body contain the same 
set of 23 pairs of chromosomes. It is estimated that these 
autosomal and sex chromosomes encode approximately 100 , 000 
genes. The differences among different types of cells are 

35 believed to reflect the differential expression of the 
100,000 or so genes. Fundamental questions of biology 
could be answered by understanding which genes are 
transcribed and knowing the relative abundance of 
transcripts in different cells. 
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Previously, the art has only provided for the analysis 
of a few known genes at a time by standard molecular 
biology techniques such as PCR, northern blot analysis, or 
other types of DNA probe analysis such as in situ 
5 hybridization. Each of these methods allows one to analyze 
the transcription of only known genes and/or small numbers 
of genes at a time. Nucl. Acids Res. 19., 7097-7104 (1991); 
Nucl. Acids Res. 18, 4833-42 (1990); Nucl. Acids Res. 18, 
2789-92 (1989); European J. Neuroscience 2 f 1063-1073 
10 (1990); Analytical Biochem. 187 , 364-73 (1990); Genet, 
Annals Techn. Appl. 7, 64-70 (1990); GATA 8(4), 129-33 
(1991); Proc. Natl. Acad. Sci. USA 85/ 1696-1700 (1988); 
Nucl. Acids Res. 19, 1954 (1991); Proc. Natl. Acad. Sci. 
USA SB, 1943-47 (1991); Nucl. Acids Res. 19, 6123-27 
15 (1991); Proc. Natl. Acad. Sci. USA 85, 5738-42 (1988); 
Nucl. Acids Res. 16, 10937 (1988) . 

Studies of the number and types of genes whose 
transcription is induced or otherwise regulated during cell 
processes such as activation, differentiation, aging, viral 
20 transformation, morphogenesis, and mitosis have been 

pursued for many years, using a variety of methodologies. 
One of the earliest methods was to isolate and analyze 
levels of the proteins in a cell, tissue, organ system, or 
even organisms both before and after the process of 
25 interest. One method of analyzing multiple proteins in a 
sample is using 2-dimensional gel electrophoresis, wherein 
proteins can be, in principle, identified and quantified as 
individual bands, and ultimately reduced to a discrete 
signal. At present, 2-dimensional analysis only resolves 
30 approximately 15% of the proteins. In order to positively 
analyze those bands which are resolved, each band must be 
excised from the membrane and subjected to protein sequence 
analysis using Edman degradation. Unfortunately, most of 
the bands were present in quantities too small to obtain a 
35 reliable sequence, and many of those bands contained more 
than one discrete protein. An additional difficulty is 
that many of the proteins were blocked at the 
amino-terminus , further complicating the sequencing 
process . 
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Analyzing differentiation at the gene transcription 
level has overcome many of these disadvantages and 
drawbacks, since the power of recombinant DNA technology 
allows amplification of signals containing very small 
5 amounts of material. The most common method, called 
"hybridization subtraction," involves isolation of mRNA 
from the biological specimen before (B) and after (A) the 
developmental process of interest, transcribing one set of 
mRNA into cDNA , subtracting specimen B from specimen A 

10 (mRNA from cDNA) by hybridization, and constructing a cDNA 
library from the non-hybridizing mRNA fraction. Many 
different groups have used this strategy successfully, and 
a variety of procedures have been published and improved 
upon using this same basic scheme. Nucl. Acids Res, 19, 

15 7097-7104 (1991); Nucl. Acids Res. 18, 4833-42 (1990); 
• Nucl. Acids Res. 18, 2789-92 (1989); European J. 

Neuroscience 2, 1063-1073 (1990) ; Analytical Biochem. .182* 
364-73 (1990); Genet. Annals Techn. Appl. 7, 64-70 (1990); 
GATA 8(4), 129-33 (1991); Proc . Natl. Acad. Sci. USA 85, 

20 1696-1700 (1988); Nucl. Acids Res. 19, 1954 (1991); Proc. 
Natl. Acad. Sci. USA 88/ 1943-47 (1991); Nucl. Acids Res. 
19 , 6123-27 (1991); Proc. Natl. Acad. Sci. USA 85/ 5738-42 
(1988); Nucl. Acids Res. .16, 10937 (1988). 

Although each of these techniques have particular 

25 strengths and weaknesses, there are still some limitations 
and undesirable aspects of these methods: First, the time 
and effort required to construct such libraries is quite 
large. Typically, a trained molecular biologist might 
expect construction and characterization of such a library 

30 to require 3 to 6 months, depending on the level of skill, 
experience, and luck. Second, the resulting subtraction 
libraries are typically inferior to the libraries 
constructed by standard methodology. A typical 
conventional cDNA library should have a clone complexity of 

35 at least 10 6 clones, and an average insert size of 1-3 kB. 
In contrast, subtracted libraries can have complexities of 
10 2 or 10 3 and average insert sizes of 0.2 kB. Therefore, 
there can be a significant loss of clone and sequence 
information associated with such libraries. Third, this 
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approach allows the researcher to capture only the genes 
induced in specimen A relative to specimen B, not 
vice-versa, nor does it easily allow comparison to a third 
specimen of interest (C) . Fourth, this approach requires 
5 very large amounts (hundreds of micrograms) of "driver" 
mRNA (specimen B) , which significantly limits the number 
and type of subtractions that are possible since many 
tissues and cells are very difficult to obtain in large 
quantities . 

10 Fifth, the resolution of the subtraction is dependent 

upon the physical properties of DNA:DNA or RNA : DNA 
hybridization. The ability of a given sequence to find a 
hybridization match is dependent on its unique CoT value. 
The CoT value is a function of the number of copies 

15 (concentration) of v he particular sequence, multiplied by 
the time of hybridization. It follows that for sequences 
which are abundant, hybridization events will occur very 
rapidly (low CoT value) , while rare sequences will form 
duplexes at very high CoT values. CoT values which allow 

20 such rare sequences to form duplexes and therefore be 
effectively selected are difficult to achieve in a 
convenient time frame. Therefore, hybridization 
subtraction is simply not a useful technique with which to 
study relative levels of rare mRNA species. Sixth, this 

25 problem is further complicated by the fact that duplex 
formation is also dependent on the nucleotide base 
composition for a given sequence. Those sequences rich in 
G + C form stronger duplexes than those with high contents 
of A + T. Therefore, the former sequences will tend to be 

30 removed selectively by hybridization subtraction. Seventh, 
it is possible that hybridization between nonexact matches 
can occur. When this happens, the expression of a 
homologous gene may "mask" expression of a gene of 
interest, artificially skewing the results for that 

35 particular gene. 

Matsubara and Okubo proposed using partial cDNA 
sequences to establish expression profiles of genes which 
could be used in functional analyses of the human genome. 
Matsubara and okubo warned against using random priming, as 
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it creates multiple unique DNA fragments from individual 
mRNAs and may thus skew the analysis of the number of 
particular mRNAs per library. They sequenced randomly 
selected members from a 3 '-directed cDNA library and 
5 established the frequency of appearance of the various 
ESTs. They proposed comparing lists of ESTs from various 
cell types to classify genes. Genes expressed in many 
different cell types were labeled housekeepers and those 
selectively expressed in certain cells were labeled cell- 

10 specific genes, even in the absence of the full sequence of 
the gene or the biological activity of the gene product. 

The present invention avoids the drawbacks of the 
prior art by providing a method to quantify the relative 
abundance of multiple gene transcripts in a given 

15 biological specimen by the use of high-throughput 

sequence-specific analysis of individual RNAs and/or their 
corresponding cDNAs . 

The present invention offers several advantages over 
current protein discovery methods which attempt to isolate 

20 individual proteins based upon biological effects. The 
method of the instant invention provides for detailed 
diagnostic comparisons of cell profiles revealing numerous 
changes in the expression of individual transcripts. 

The instant invention provides several advantages over 

25 current subtraction methods including a more complex 
library analysis (10 6 to 10 7 clones as compared to 10 3 
clones) which allows identification of low abundance 
messages as well as enabling the identification of messages 
which either increase or decrease in abundance. These 

30 large libraries are very routine to make in contrast to the 
libraries of previous methods. In addition, homologues can 
easily be distinguished with the method of the instant 
invention. 

This method is very convenient because it organizes a 
35 large quantity of data into a comprehensible, digestible 
format. The most significant differences are highlighted 
by electronic subtraction. In depth analyses are made more 
convenient , 
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The present invention provides several advantages over 
previous methods of electronic analysis. of cDNA. The 
method is particularly powerful when more than 100 and 
preferably more than 1,000 gene transcripts are analyzed. 
5 In such a case, new low-frequency transcripts are 
discovered and tissue typed. 

High resolution analysis of gene expression can be 
used directly as a diagnostic profile or to identify 
disease-specific genes for the de/elopment of more classic 

10 diagnostic approaches. 

This process is defined as gene transcript frequency 
analysis. The resulting quantitative analysis of the gene 
transcripts is defined as comparative gene transcript 
analysis . 

15 3. SUMMARY OF THE IN VENTION 

The invention is a method of analyzing a specimen 
containing gene transcripts comprising the steps of (a) 
producing a library of biological sequences; (b) generating 
a set of transcript sequences, where each of the transcript 
20 sequences in said set is indicative of a different one of 
the biological sequences of the library; (c) processing the 
transcript sequences in a programmed computer (in which a 
database of reference transcript sequences indicative of 
reference sequences is stored) , to generate an identified 
25 sequence value for each of the transcript sequences, where 
each said identified sequence value is indicative of 
sequence annotation and a degree of match between one of 
the biological sequences of the library and at least one of 
the reference sequences; and (d) processing each said 
30 identified sequence value to generate final data values 

indicative of the number of times each identified sequence 
value is present in the library. 

The invention also includes a method of comparing two 
specimens containing gene transcripts. The first specimen 
35 is processed as described above. The second specimen is 
used to produce a second library of biological sequences, 
which is used to generate a second set of transcript 
sequences, where each of the transcript sequences in the 
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second set. is indicative of one of the biological sequences 
of the second library. Then the second set of transcript 
sequences is processed in a programmed computer to generate 
a second set of identified sequence values, namely the 
5 further identified sequence values, each of which is 

indicative of a sequence annotation and includes a degree 
of match between one of the biological sequences of the 
second library and at least one of the reference sequences. 
The further identified sequence values are processed to 
10 generate further final data values indicative of the number 
of times each further identified sequence value is present 
in the SBcond library. l^e final data values from the 
first specimen and the further identified sequence values 
from the second specimen are processed to generate ratios 
15 of transcript sequences, which indicate the differences in 
the number of gene transcripts between the two specimens. 

In a further embodiment, the method includes 
quantifying the relative abundance of mRNA in a biological 
specimen by (a) isolating a population of mRNA transcripts 
20 from a biological specimen; (b) identifying genes from 
which the mRNA was transcribed by a sequence-specific 
method; (c) determining the numbers of mRNA transcripts 
corresponding to each of the genes; and (d) using the mRNA 
transcript numbers to determine the relative abundance of 
25 mRNA transcripts within the population of mRNA transcripts. 
Also disclosed is a method of producing a gene 
transcript image analysis by first obtaining a mixture of 
mRNA , from which cDNA copies are made. The cDNA is 
inserted into a suitable vector which is used to transfect 
30 suitable host strain cells which are plated out and 

permitted to grow into clones, each cone representing a 
unique mRNA. A representative population of clones 
transfected with cDNA is isolated. Each clone in the 
population is identified by a sequence-specific method 
35 which identifies the gene from which the unique mRNA was 
transcribed. The number of times each gene is identified 
to a clone is determined to evaluate gene transcript 
abundance. The genes and their abundances are listed in 
order of abundance to produce a gene transcript image. 
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in a further embodiment, the relative abundance of the 
gene transcripts in one cell type or tissue is compared 
with the relative abundance of gene transcript numbers in a 
second cell type or tissue in order to identify the 
5 differences and similarities. 

In a further embodiment, the method includes a system 
for analyzing a library of biological sequences including a 
means for receiving a set of transcript sequences, where 
each of the transcript sequences is indicative of a 

10 different one of the biological sequences of the library; 
and a means for processing the transcript sequences in a 
computer system in which a database of reference transcript 
sequences indicative of reference sequences is stored, 
wherein the computer is programmed with software for 

15 generating an identified sequence value for each of the 
transcript sequences, where each said identified sequence 
value is indicative of a sequence annotation and the degree 
of match between a different one of the biological 
sequences of the library and at least one of the reference 

20 sequences, and for processing each said identified sequence 
value to generate final data values indicative of the 
number of times each identified sequence value is present 

in the library. 

In essence, the invention is a method and system for 

25 quantifying the relative abundance of gene transcripts in a 
biological specimen. The invention provides a method for 
comparing the gene transcript image from two or more 
different biological specimens in order to distinguish 
between the two specimens and identify one or more genes 

30 which are differentially expressed between the two 
specimens. Thus, this gene transcript image and its 
comparison can be used as a diagnostic. One embodiment of 
the method generates high-throughput sequence-specific 
analysis of multiple RNAs or their corresponding cDNAs: a 

3 5 gene transcript image. Another embodiment of the method 

produces the gene transcript imaging analysis by the use of 
high-throughput cDNA sequence analysis. In addition, two 
or more gene transcript images can be compared and used to 
detect or diagnose a particular biological state, disease, 
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or condition which is correlated to the relative abundance 
of gene transcripts in a given cell or population of cells. 

4. DESCRIPTION OF THE TABLES AND DRAWINGS 

4.1. TABLES 

5 Table 1 presents a detailed explanation of the letter 

codes utilized in Tables 2-5. 

Table 2 lists the one hundred most common gene 
transcripts. It is a partial list of isolates from the 
HUVEC cDNA library prepared and sequenced as described 

10 below. The left-hand column refers to the sequence's order 
of abundance in this table. The next column labeled 
"number' 1 is the clone number of the first HUVEC sequence 
identification reference matching the sequence in the 
"entry 11 column number. Isolates that have not been 

15 sequenced are not present in Table 2. The next column, 

labeled "N", indicates the total number of cDNAs which have 
the same degree of match with the sequence of the reference 
transcript in the "entry" column. 

The column labeled "entry" gives the NIH GENBANK locus 

20 name, which corresponds to the library sequence numbers. 

The "s" column indicates in a few cases the species of the 
reference sequence. The code for column "s" is given in 
Table 1. The column labeled "descriptor" provides a plain 
English explanation of the identity of the sequence 

25 corresponding to the NIH GENBANK locus name in the "entry" 
column. 

Table 3 is a comparison of the top fifteen most 
abundant gene transcripts in normal monocytes and activated 
macrophage cells. 

30 Table 4 is a detailed summary of library subtraction 

analysis summary comparing the THP-1 and human macrophage 
cDNA sequences. In Table 4, the same code as in Table 2 is 
used. Additional columns are for "bgfreq" (abundance 
number in the subtractant library) , "rfend" (abundance 

35 number in the target library) and "ratio" (the target 
abundance number divided by the subtractant abundance 
number) . As is clear from perusal of the table, when the 
abundance number in the subtractant library is "0", the 
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target abundance number is divided by 0,05. This is a way 
of obtaining a result (not possible dividing by 0) and 
distinguishing the result from ratios of subtractant 
numbers of 1 . 

5 Table 5 is the computer program, written in source 

code, for generating gene transcript subtraction profiles. 

Tab le 6 is a partial listing of database entries used 
in the electronic northern blot analysis as provided by the 
present invention . 

10 

4.2. BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a chart summarising data collected and 
stored regarding the library construction portion of 
sequence preparation and analysis. 

15 Figure 2 is a diagram representing the sequence of 

operations performed by "abundance sort 11 software in a 
class of preferred embodiments of the inventive method. 

Figure 3 is a block diagram of a preferred embodiment 
of the system of the invention. 

20 Figure 4 is a more detailed block diagram of the 

bioinf ormatics process from new sequence (that has already 
been sequenced but not identified) to printout of the 
transcript imaging analysis and the provision of database 
subscriptions . 

25 5. DETAILED DESCRIPTION OF THE INVENTION 

The preser' invention provides a method to compare the 
relative abundance of gene transcripts in different 
biological specimens by the use of high-throughput 
sequence-specific analyris of individual RNAs or their 

30 corresponding cDNAs (or alternatively, of data representing 
other biological sequences) . This process is denoted 
herein as gene transcript imaging. The quantitative 
analysis of the relative abundance for a set of gene 
transcripts is denoted herein as "gene transcript image 

35 analysis" or "gene transcript frequency analysis". The 
present invention allows one to obtain a profile for gene 
transcription in any given population of cells or tissue 
from any type of organism. The invention can be applied to 
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obtain a profile of a specimen consisting of a single cell 
(or clones of a single cell), or of many cells, or of 
tissue more complex than a single cell and containing 
multiple cell types, such as liver. 
5 The invention has significant advantages in the fields 

of diagnostics, toxicology and pharmacology, to name a few, 
A highly sophisticated diagnostic test can be performed on 
the ill patient in whom a diagnosis has not been made. A 
biological specimen consisting of the patient's fluids or 
10 tissues is obtained, and the gene transcripts are isolated 
and expanded to the extent necessary to determine their 
identity. Optionally, the gene transcripts can be 
converted to cDNA. A sampling of the gene transcripts are 
subjected to sequence-specific analysis and quantified. 
15 These gene transcript sequence abundances are compared 
against reference database sequence abundances including 
normal data sets for diseased and healthy patients. The 
patient has the disease (s) with which the patient's data 
set most closely correlates. 
2 0 For example, gene transcript frequency analysis can be 

used to differentiate normal cells or tissues from diseased 
cells or tissues, just as it highlights differences between 
normal monocytes and activated macrophages in Table 3, 

In toxicology, a fundamental question is which tests 
25 are most effective in predicting or detecting a toxic 

effect. Gene transcript imaging provides highly detailed 
information on the cell and tissue environment, some of 
which would not be obvious in conventional, less detailed 
screening methods. The gene transcript image is a more 
30 powerful method to predict drug toxicity and efficacy. 
Similar benefits accrue in the use of this tool in 
pharmacology. The gene transcript image can be used 
selectively to look at protein categories which are 
expected to be affected, for example, enzymes which 
35 detoxify toxins. 

In an alternative embodiment, comparative gene 
transcript frequency analysis is used to differentiate 
between cancer cells which respond to anti-cancer agents 
and those which do not respond. Examples of anti-cancer 
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agents are tamoxifen, vincristine, vinblastine, 
podophyllotoxins, etoposide, tenisposide, cisplatin, 
biologic response modifiers such as interferon, 11-2, GM- 
CSF, enzymes, hormones and the like. This method also 
5 provides a means for sorting the gene transcripts by 
functional category. In the case of cancer cells, 
transcription factors or other essential regulatory 
molecules are very important categories to analyze across 

different libraries . 
10 i n yet another embodiment, comparative gene transcript 

frequency analysis is used to differentiate between control 
liver cells and liver celis isolated from patients treated 
with experimental drugs like FIAU to distinguish between 
pathology caused by the underlying disease and that caused 

15 by the drug. 

In yet another embodiment, comparative gene transcript 
frequency analysis is used to differentiate between brain 
tissue from patients treated and untreated with lithium. 

In a further embodiment, comparative gene transcript 

20 frequency analysis is used to differentiate between 
cyclosporin and FK506-treated cells and normal cells* 

In a further embodiment, comparative gene transcript 
frequency analysis is used to differentiate between virally 
infected (including HIV-infected) human cells and 

25 uninfected human cells. Gene transcript frequency analysis 
is also used to rapidly survey gene transcripts in HIV- 
resistant, HIV-infected, and HIV-sensitive cells. 
Comparison of gene transcript abundance will indicate the 
success of treatment and/ or new avenues to study. 

30 in a further embodiment, comparative gene transcript 

frequency analysis is used to differentiate between 
bronchial lavage fluids from healthy and unhealthy patients 
with a variety of ailments. 

In a further embodiment, comparative gene transcript 

35 frequency analysis is used to differentiate between cell, 
plant, microbial and animal mutants and wild-type species. 
In addition, the transcript abundance program is adapted to 
permit the scientist to evaluate the transcription of one 
gene in many different tissues. Such comparisons could 
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identify deletion mutants which do not produce a gene 
product and point mutants which produce a less abundant or 
otherwise different message. Such mutations can affect 
basic biochemical and pharmacological processes, such as 
5 mineral nutrition and metabolism, and can be isolated by 
means known to those skilled in the art. Thus, crops with 
improved yields, pest resistance and other factors can be 
developed ♦ 

In a further embodiment, comparative gene transcript 

10 frequency analysis is used for an interspecies comparative 
analysis which would allow for the selection of better 
pharmacologic animal models. In this embodiment, humans 
and other animals (such as a mouse) , or their cultured 
cells are treated with a specific test agent. The relative 

15 sequence abundance of each cDNA population is determined. 
* If the animal test system is a good model, homologous genes 
in the animal cDNA population should change expression 
similarly to those in human cells. If side effects are 
detected with the drug, a detailed transcript abundance 

20 analysis will be performed to survey gene transcript 

changes. Models will then be evaluated by comparing basic 
physiological changes . 

In a further embodiment, comparative gene transcript 
frequency analysis is used in a clinical setting to give a 

25 highly detailed gene transcript profile of a patient's 
cells or tissue (for example, a blood sample) . In 
particular, gene transcript frequency analysis is used to 
give a high resolution gene expression profile of a 
diseased state or condition. 

30 in the preferred embodiment, tie method utilizes 

high-throughput cDNA sequencing to identify specific 
transcripts of interest. The generated cDNA and deduced 
amino acid sequences are then extensively compared with 
GENBANK and other sequence data banks as described below. 

35 The method offers several advantages over current protein 
discovery by two-dimensional gel methods which try to 
identify individual proteins involved in a particular 
biological effect. Here, detailed comparisons of profiles 
of activated and inactive cells reveal numerous changes in 
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the expression of individual transcripts. After it is 
determined if the sequence is an "exact" match, similar or 
a non-match, the sequence is entered into a database* 
Next, the numbers of copies of cDNA corresponding to each 
5 gene are tabulated. Although this can be done slowly and 
arduously, if at all, by human hand from a printout of all 
entries, a computer program is a useful and rapid way to 
tabulate this information. The numbers of cDNA copies 
(optionally divided by the total number of sequences in the 

10 data set) provides a picture of the relative abundance of 
transcripts for each corresponding gene. The list of 
represented genes can then be sorted by abundance in the 
cDNA population. A multitude of additional types of 
comparisons or dimensions are possible and are exemplified 

15 below. 

An alternate method of producing a gene transcript 
image includes the steps of obtaining a mixture of test 
mRNA and providing a representative array of unique probes 
whose sequences are complementary to at least some of the 

20 test mRNAs. Next, a fixed amount of the test mRNA is added 
to the arrayed probes. The test mRNA is incubated with the 
probes for a sufficient time to allow hybrids of the test 
mRNA and probes to form. The mRNA-probe hybrids are 
detected and the quantity determined. The hybrids are 

25 identified by their location in the probe array. The 
quantity of each hybrid is summed to give a population 
number. Each hybrid quantity is divided by the population 
number to provide a set of relative abundance data termed a 
gene transcript image analysis. 

30 6. EXAMPLES 

The examples below are provided to illustrate the 
subject invention. These examples are provided by way of 
illustration and are not included for the purpose of 
limiting the invention. 



35 6.1. TISSUE SOURCES AND CELL LINES 

For analysis with the computer program claimed herein, 
biological sequences can be obtained from virtually any 
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source. Most popular are tissues obtained from the human 
body. Tissues can be obtained from any organ of the body, 
any age donor, any abnormality or any immortalized cell 
line. Immortal cell lines may be preferred in some 
5 instances because of their purity of cell type; other 
tissue samples invariably include mixed cell types. A 
special technique is available to take a single cell (for 
example, a brain cell) and harness the cellular machinery 
to grow up sufficient cDNA for sequencing by the techniques 

10 and analysis described herein (cf. U.S. Patent Nos. 
5,021,335 and 5,168,038, which are incorporated by 
reference) . The examples given herein utilized the 
following immortalized cell lines: monocyte-like U-937 
cells, activated macrophage-like THP-1 cells, induced 

15 vascular endothelial cells (HUVEC cells) and mast cell-like 
HMC-1 cells. 

The U-937 cell line is a human histiocytic lymphoma 
cell line with monocyte characteristics, established from 
malignant cells obtained from the pleural effusion of a 

20 patient with diffuse histiocytic lymphoma (Sundstrom, C. 
and Nilsson, K. (1976) Int. J. cancer 17:565). U-937 is 
one of only a few human cell lines with the morphology, 
cytochemistry, surface receptors and monocyte-like 
characteristics of histiocytic cells. These cells can be 

25 induced to terminal monocytic differentiation and will 
express new cell surface molecules when activated with 
supernatants from human mixed lymphocyte cultures. Upon 
this type of in vitro activation, the cells undergo 
morphological and functional changes, including 

30 augmentation of antibody-dependent cellular cytotoxicity 

(ADCC) against erythroid and tumor target cells (one of the 
principal functions of macrophages) . Activation of U-937 
cells with phorbol 12-myristate 13-acetate (PMA) in vitro 
stimulates the production of several compounds, including 

35 prostaglandins, leukotrienes and platelet-activating factor 
(PAF) , which are potent inflammatory mediators. Thus, U- 
937 is a cell line that is well suited for the 
identification and isolation of gene transcripts associated 
with normal monocytes. 
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The HUVEC cell line is a normal, homogeneous, well 
characterized, early passage endothelial cell culture from 
human umbilical vein (Cell Systems Corp. , 12815 NE 124th 
Street, Kirkland, WA 98034) . Only gene transcripts from 
5 induced, or treated, HUVEC cells were sequenced* One batch 
of 1 X 10 8 cells was treated for 5 hours with 1 U/ml rIL-lb 
and 100 ng/ml E» coli lipopolysaccharide (LPS) endotoxin 
prior to harvesting. A separate batch of 2 X 10 8 cells was 
treated at confluence with 4 U/ml TNF and 2 U/ml 
10 interf eron-gamma (IFN-gamma) prior to harvesting. 

THP-1 is a human leukemic cell line with distinct 
monocytic characteristics. This cell line was derived from 
the blood of a 1-year-old boy with acute monocytic leukemia 
(Tsuchiya, S. et al. (1980) Int. J. Cancer: 171-76). The 
15 following cytological and cytochemical criteria were used 
to determine the monocytic nature of the cell line: 1) the 
presence of alpha-naphthyl butyrate esterase activity which 
could be inhibited by sodium fluoride; 2) the production of 
lysozyme; 3) the phagocytosis of latex particles and 
20 sensitized SRBC (sheep red blood cells); and 4) the ability 
of mitomycin C-treated THP-1 cells to activate T~ 
lymphocytes following ConA (concanavalin A) treatment. 
Morphologically, the cytoplasm contained small azurophilic 
granules and the nucleus was indented and irregularly 
25 shaped with deep folds. The cell line had Fc and C3b 
receptors, probably functioning in phagocytosis. THP-1 
cells treated with the tumor promoter 12-o-tetradecanoyl- 
phorbol-13 acetate (TPA) stop proliferating and 
differentiate into macrophage-like cells which mimic native 
3 0 monocyte-derived macrophages in several respects. 

Morphologically, as the cells change shape, the nucleus 
becomes more irregular and additional phagocytic vacuoles 
appear in the cytoplasm. The differentiated THP-1 cells 
also exhibit an increased adherence to tissue culture 
35 plastic. 

HMC-1 cells (a human mast cell line) were established 
from the peripheral blood of a Mayo Clinic patient with 
mast cell leukemia (Leukemia Res. (1988) 12:345-55). The 
cultured cells looked similar to immature cloned murine 
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„st cells, contained histamine, and stained positively for 
chloroacetate esterase, amino caproate esterase, eosinophil 
major basic protein (MBP) and tryptase. The HMC-1 cells 
have, however, lost the ability to synthesize normal IgE 
5 receptors. HMC-1 cells also possess a 10; 16 translocation, 
present in cells initiaxxy collected by leukophoresis from 
the patient and not an artifact of culturing. Thus, HMC-1 
cells are a good model for mast cells. 

6.2. CONSTRUC TION OF cDNA L IBRARIES 

10 For inter-library comparisons, the libraries must be 

prepared in similar manors. Certain parameters appear to 
be particularly important to control. One such parameter 
is the method of isolating mRNA . It is important to use 
the same conditions to remove DNA and heterogeneous nuclear 

15 RNA from comparison libraries. Size fractionation of cDNA 
must be carefully controlled. The same vector preferably 
should be used for preparing libraries to be compared. At 
the very least, the same type of vector (e.g., 
unidirectional vector) should be used to assure a valid 

20 comparison. A unidirectional vector may be preferred in 
order to more easily analyze the output. 

It is preferred to prime only with oligo dT 
unidirectional primer in order to obtain one only clone per 
mRNA transcript when obtaining cDNAs . However, it is 

25 recognized that employing a mixture of oligo dT and random 
primers can also be advantageous because such a mixture 
results in more sequence diversity when gene discovery also 
is a goal. Similar effects can be obtained with DR2 
(Clontech) and HXLOX (US Biochemical) and also vectors from 

30 Invitrogen and Novagen. These vectors have two 

requirements. First, there must be primer sites for 
commercially available primers such as T3 or M13 reverse 
primers. Second, the vector must accept inserts up to 10 
kB. 

35 it also is important that the clones be randomly 

sampled, and that a significant population of clones is 
used. Data have been generated with 5,000 clones; however, 
if very rare genes are to be obtained and/or their relative 
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abundance determined, as many as 100,000 clones from a 
single library may need to be sampled. Size fractionation 
of cDNA also must be carefully controlled. Alternately, 
plaques can be selected, rather than clones. 
5 Besides the Uni-ZAP™ vector system by Stratagene 

disclosed below, it is now believed that other similarly 
unidirectional vectors also can be used. For example, it 
is believed that such vectors include but are not limited 
to DR2 (Clontech) , and HXLOX (U.S. Biochemical). 

10 Preferably, the details of library construction (as 

shown in Figure 1) are collected and stored in a database 
fur later retrieval relative to the sequences being 
compared. Fig. 1 shows important information regarding the 
library collaborator or cell or cDNA supplier, 

15 pretreatment , biological source, culture, mRNA preparation 
• and cDNA construction. Similarly detailed information 
about the other steps is beneficial in analysing sequences 
and libraries in depth. 

RNA must be harvested from cells and tissue samples 

20 and cDNA libraries are subsequently constructed. cDNA 

libraries can be constructed according to techniques known 
in the art. (See, for example, Maniatis, T. et al. (1982) 
Molecular Cloning, Cold Spring Harbor Laboratory, New 
York) . cDNA libraries may also be purchased. The U-937 

25 cDNA library (catalog No. 937207) was obtained from 

Stratagene, Inc., 11099 M. Torrey Pines Rd., La Jolla, CA 
92037. 

The THP-1 cDNA library was custom constructed by 
Stratagene from THP-1 cells cultured 48 hours with 100 nm 

30 TPA and 4 hours with 1 fiq/ml LPS . T.ie human mast cell HMC- 
1 cDNA library was also custom constructed by Stratagene 
from cultured HMC-l cells. The HUVEC cDNA library was 
custom constructed by Stratagene from two batches of 
induced HUVEC cells which were separately processed. 

35 Essentially, all the libraries were prepared in the 

same manner. First, poly (A+) RNA (mRNA) was purified. For 
the U-937 and HMC-l RNA, cDNA synthesis was only primed 
with oligo dT. For the THP-1 and HUVEC RNA, cDNA synthesis 
was primed separately with both oligo dT and random 
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hexamers, and the two cDNA libraries were treated 
separately. Synthetic adaptor oligonucleotides were 
ligated onto cDNA ends enabling its insertion into the Uni- 
Zap™ vector system (Stratagene) , allowing high efficiency 
5 unidirectional (sense orientation) lambda library 

construction and the convenience of a plasmid system with 
blue-white color selection to detect clones with cDNA 
insertions* Finally , the two libraries were combined into 
a single library by mixing equal numbers of bacteriophage. 

10 The libraries can be screened with either DNA probes 

or antibody probes and the pBluescript® phagemid 
(Stratagene) can be rapidly excised in vivo . The phagemid 
allows the use of a plasmid system for easy insert 
characterisation, sequencing, site-directed mutagenesis, 

15 the creation of unidirectional deletions and expression of 
fusion proteins. The custom-constructed library phage 
particles were infected into E. coli host strain XLl-Blue® 
(Stratagene) , which has a high transformation efficiency, 
increasing the probability of obtaining rare, under- 

20 represented clones in the cDNA library. 

6 -3- ISOLATION OF cDNA CLONES 

The phagemid forms of individual cDKA clones were 
obtained by the in vivo excision process, in which the host 
bacterial strain was coinfected with both the lambda 

25 library phage and an fl helper phage. Proteins derived 

from both the library-containing phage and the helper phage 
nicked the lambda DNA, initiated new DNA synthesis from 
defined sequences on the lambda target DNA and created a 
smaller, single stranded circular phagemid DNA molecule 

30 that included all DNA sequences of the pBluescript® plasmid 
and the cDNA insert. The phagemid DNA was secreted from 
the cells and purified, then used to re-infect fresh host 
cells, where the double stranded phagemid DNA was produced. 
Because the phagemid carries the gene for beta-lactamase , 

35 the newly-transformed bacteria are selected on medium 
containing ampicillin. 

Phagemid DNA was purified using the Magic Minipreps™ 
DNA Purification System (Promega catalogue #A7100. Promega 
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Corp., 2800 Woods Hollow Rd . , Madison, WI 53711). This 
small-scale process provides a simple and reliable method 
for lysing the bacterial cells and rapidly isolating 
purified phagemid DNA using a proprietary DNA-binding 
5 resin. The DNA was eluted from the purification resin 
already prepared for DNA sequencing and other analytical 
manipulations . 

Phagemid DNA was also purified using the QIAwell-8 
Plasmid Purification System from QIAGEN® DNA Purification 
10 System (QIAGEN Inc., 9259 Eton Ave., Chattsworth, CA 

91311) . This product line provides a convenient, rapid and 
reliable high-throughput method for lysing the bacterial 
cells and isolating highly purified phagemid DNA using 
QIAGEN anion-exchange resin particles with EMPORE™ membrane 
15 technology from 3M ; ^ a multiwell format. The DNA was 

eluted from the purification resin already prepared for DNA 
sequencing and other analytical manipulations. 

An alternate method of purifying phagemid has recently 
become available. It utilizes the Miniprep Kit (Catalog 
20 No. 77468, available from Advanced Genetic Technologies 
Corp., 19212 Orbit Drive, Gaithersburg , Maryland). This 
kit is in the 96-well format and provides enough reagents 
for 960 purifications. Each kit is provided with a 
recommended protocol, which has been employed except for 
25 the following changes. First, the 96 wells are each filled 
with only 1 ml of sterile terrific broth with carbenicillin 
at 25 mg/L and glycerol at 0.4%. After the wells are 
inoculated, the bacteria are cultured for 24 hours and 
lysed with 60 pi of lysis buffer. A centrif ugation step 
30 (2900 rpm for 5 minutes) is performed before the contents 
of the block are added to the primary filter plate. The 
optional step of adding isopropanol to TRIS buffer is not 
routinely performed. After the last step in the protocol, 
samples are transferred to a Beckman 96-well block for 
35 storage. 

Another new DNA purification system is the WIZARD™ 
product line which is available from Promega (catalog No. 
A7071) and may be adaptable to the 96-well format. 
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6.4. SEQUENCING OF cD NA CLONES 

The cDNA inserts from random isolates of the U-937 and 
THP-1 libraries were sequenced in part. Methods for DNA 
sequencing are well known in the art. Conventional 
5 enzymatic methods employ DNA polymerase Klenow fragment, 
Sequenase™ or Taq polymerase to extend DNA chains from an 
oligonucleotide primer annealed to the DNA template of 
interest. Methods have been developed for the use of both 
single- and double-stranded templates. The chain 

10 termination reaction products are usually electrophoresed 
on urea-acrylamide gels and are detected either by 
autoradiography (for radionuclide-labeled precursors) or by 
fluorescence (for fluorescent-labeled precursors). Recent 
improvements in mechanized reaction preparation, sequencing 

15 and analysis using the fluorescent detection method have 

permitted expansion in the number of sequences that can be 
determined per day (such as the Applied Biosystems 373 and 
377 DNA sequencer, Catalyst 800) . Currently with the 
system as described, read lengths range from 250 to 400 

2 0 bases and are clone dependent. Read length also varies 
with the length of time the gel is run. In general, the 
shorter runs tend to truncate the sequence. A minimum of 
only about 25 to 50 bases is necessary to establish the 
identification and degree of homology of the sequence. 

25 Gene transcript imaging can be used with any sequence- 
specific method, including, but not limited to 
hybridization, mass spectroscopy, capillary electrophoresis 
and 505 gel electrophoresis. 

6.5. HOMOLOGY SEARCHING OF cDNA CLONE AND 
30 DEDUCED PROTEIN (and Subsequent Steps) 

Using the nucleotide sequences derived from the cDNA 

clones as query sequences (sequences of a Sequence 

Listing) f databases containing previously identified 

sequences are searched for areas of homology (similarity) . 

35 Examples of such databases include Genbank and EMBL. We 

next describe examples of two homology search algorithms 

that can be used, and then describe the subsequent 

computer-implemented steps to be performed in accordance 

with preferred embodiments of the invention. 
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In the following description of the computer* 
implemented steps of the invention, the word "library" 
denotes a set (or population) of biological specimen 
nucleic acid sequences. A "library" can consist of cDNA 
5 sequences, RNA sequences, or the like, which characterize a 
biological specimen. The biological specimen can consist 
of cells of a single human cell type (or can be any of the 
other above-mentioned types of specimens) . We contemplate 
that the sequences in a library have been determined so as 
10 to accurately represent or characterize a biological 

specimen (for example, they can consist of representative 
cDNA sequences from clon s of RNA taken from a single human 
cell) . 

In the following description of the computer- 
15 implemented steps of the invention, the expression 

"database" denotes a set of stored data which represent a 
collection of sequences, which in turn represent a 
collection of biological reference materials. For example, 
a database can consist of data representing many stored 
20 cDNA sequences which are in turn representative of human 
cells infected with various viruses, cells of humans of 
various ages, cells from different mammalian species, and 
so on . 

In preferred embodiments, the invention employs a 
25 computer programmed with software (to be described) for 
performing the following steps: 

(a) processing data indicative of a library of cDNA 
sequences (generated as a result of high-throughput cDNA 
sequencing or other method) to determine whether each 

3 0 sequence in the library matches a DNA sequence of a 

reference database of DNA sequences (and if so, identifying 
the reference database entry which matches the sequence and 
indicating the degree of match between the reference 
sequence and the library sequence) and assigning an 

35 identified sequence value based on the sequence annotation 
and degree of match to each of the sequences in the 
library; 

(b) for some or all entries of the database, 
tabulating the number of matching identified sequence 
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values in the library (Although this can be done by human 
hand from a printout of all entries, we prefer to perform 
this step using computer software to be described below.)/ 
thereby generating a set of final data values or "abundance 

5 numbers"; and 

(c) if the libraries are different sizes, dividing 
each abundance number by the total number of sequences in 
the library, to obtain a relative abundance number for each 
identified sequence value (i.e., a relative abundance of 

10 each gene transcript) . 

The list of identified sequence values (or genes 
corresponding thereto) can ;hen be sorted by abundance in 
the cDNA population. A multitude of additional types of 
comparisons or dimensions are possible. 

15 For example (to be described below in greater detail) , 

steps (a) and (b) can be repeated for two different 
libraries (sometimes referred to as a "target 11 library and 
a "subtractant" library) . Then, for each identified 
sequence value (or gene transcript) , a "ratio" value is 

20 obtained by dividing the abundance number (for that 

identified sequence value) for the target library, by the 
abundance number (for that identified sequence value) for 
the subtractant library. 

In fact, subtraction may be carried out on multiple 

25 libraries. It is possible to add the transcripts from 

several libraries (for example, three) and then to divide 
them by another set of transcripts from multiple libraries 
(again, for example, three) . Notation for this operation 
may be abbreviated as (A+B+C) / (D+E+F) , where the capital 

30 letters each indicate an entire library. Optionally the 
abundance numbers of transcripts in the summed libraries 
may be divided by the total sample size before subtraction. 

Unlike standard hybridization technology which permits 
a single subtraction of two libraries, once one has 

35 processed a set or library transcript sequences and stored 
them in the computer, any number of subtractions can be 
performed on the library. For example, by this method, 
ratio values can be obtained by dividing relative abundance 
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values in a first library by corresponding values in a 
second library and vice versa. 

In variations on step (a) , the library consists of 
nucleotide sequences derived from cDNA clones. Examples of 
5 databases which can be searched for areas of homology 

(similarity) in step (a) include the commercially available 
databases known as Genbank (NIH) EMBL (European Molecular 
Biology Labs, Germany), and GENESEQ (Intelligenetics , 
Mountain View, California) . 

10 One homology search algorithm which can be used to 

implement step (a) is the algorithm described in the paper 
by D.J, Lipman and W.R. Pearson, entitled "Rapid and 
Sensitive Protein Similarity Searches," Science, 227:1435 
(1985)* In this algorithm, the homologous regions are 

15 searched in a two-step manner. In the first step, the 

highest homologous regions are determined by calculating a 
matching score using a homology score table* The parameter 
"Ktup" is used in this step to establish the minimum window 
size to be shifted for comparing two sequences. Ktup also 

20 sets the number of bases that must match to extract the 
highest homologous region among the sequences. In this 
step, no insertions or deletions are applied and the 
homology is displayed as an initial (INIT) value. 

In the second step, the homologous regions are aligned 

25 to obtain the highest matching score by inserting a gap in 
order to add a probable deleted portion. The matching 
score obtained in the first step is recalculated using the 
homology score Table and the insertion score Table to an 
optimized (OPT) value in the final output. 

30 DNA homologies between two sequences can be examined 

graphically using the Harr method of constructing dot 
matrix homology plots (Needleman, S.B. and Wunsch, CO., 
Mom. Biol 48:443 (1970)). This method produces a 
two-dimensional plot which can be useful in determining 

35 regions of homology versus regions of repetition. 

However, in a class of preferred embodiments, step (a) 
is implemented by processing the library data in the 
commercially available computer program known as the 
INHERIT 670 Sequence Analysis System, available from 
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Applied Biosysteras Inc. (Foster City, California), 
including the software known as the Factura software (also 
available from Applied Biosystems Inc.), The Factura 
program preprocesses each library sequence to "edit out" 
5 portions thereof which are not likely to be of interest, 
such as the vector used to prepare the library. Additional 
sequences which can be edited out or masked (ignored by the 
search tools) include but are not limited to the polyA tail 
and repetitive GAG and CCC sequences. A low-end search' 
10 program can be written to mask out such "low-information" 
sequences, or programs such as BLAST can ignore the low- 
information sequences . 

In the algorithm implemented by the INHERIT 670 
Sequence Analysis System, the Pattern Specification 
15 Language (developed by TRW Inc.) is used to determine 
regions of homology. "There are three parameters that 
determine how INHERIT analysis runs sequence comparisons: 
window size, window offset and error tolerance. Window 
size specifies the length of the segments into which the 
20 query sequence is subdivided. Window offset specifies 

where to start the next segment [to be compared], counting 
from the beginning of the previous segment. Error 
tolerance specifies the total number of insertions, 
deletions and/or substitutions that are tolerated over the 
25 specified word length. Error tolerance may be set to any 
integer between 0 and 6. The default settings are window 
tolerance=20, window offset=10 and error tolerance=3 . " 
INHERIT Analysis Users Manual , pp. 2-15. Version 1.0, 
Applied Biosystems, Inc., October 1991. 
30 Using a combination of these three parameters, a 

database (such as a DNA database) can be searched for 
sequences containing regions of homology and the 
appropriate sequences are scored with an initial value. 
Subsequently, these homologous regions are examined using 
35 dot matrix homology plots to determine regions of homology 
versus regions of repetition. Smith-Waterman alignments 
can be used to display the results of the homology search. 
The INHERIT software can be executed by a Sun computer 
system programmed with the UNIX operating system. 
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Search alternatives to INHERIT include the BLAST 
program, GCG (available from the Genetics Computer Group, 
WI) and the Dasher program (Temple Smith, Boston 
University, Boston, MA) . Nucleotide sequences can be 
5 searched against Genbank, EMBL or custom databases such as 
GENESEQ (available from Intelligenetics , Mountain View, CA) 
or other databases for genes. In addition, we have 
searched some sequences against our own in-house database. 

in preferred embodiments, the transcript sequences are 
10 analyzed by the INHERIT software for best conformance with 
a reference gene transcript to assign a sequence identifier 
and assigned the degree of homology, which together are the 
identified sequence value and are input into, and further 
processed by, a Macintosh personal computer (available from 
15 Apple) programmed with an "abundance sort and subtraction 
analysis" computer program (to be described below) . 

Prior to the abundance sort and subtraction analysis 
program (also denoted as the "abundance sort" program) , 
identified sequences from the cDNA clones are assigned 
20 value (according to the parameters given above) by degree 
of match according to the following categories: "exact" 
matches (regions with a high degree of identity) , 
homologous human matches (regions of high similarity, but 
not "exact" matches) , homologous non-human matches (regions 
25 of high similarity present in species other than human) , or 
non matches (no significant regions of homology to 
previously identified nucleotide sequences stored in the 
form of the database) . Alternately, the degree of match 
can be a numeric value as described below. 
30 With reference again to the step of identifying 

matches between reference sequences and database entries, 
protein and peptide sequences can be deduced from the 
nucleic acid sequences. Using the deduced polypeptide 
sequence, the match identification can be performed in a 
3 5 manner analogous to that done with cDNA sequences. A 

protein sequence is used as a query sequence and compared 
to the previously identified sequences contained in a 
database such as the Swiss/ Prot, PIR and the NBRF Protein 
database to find homologous proteins. These proteins are 
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initially scored for homology using a homology score Table 
(Orcutt, B.C. and Dayoff , M.O. Scoring Matrices, PIR 
Report MAT - 0285 (February 1985)) resulting in an INIT 
score • The homologous regions are aligned to obtain the 
5 highest matching scores by inserting a gap which adds a 
probable deleted portion. The matching score is 
recalculated using the homology score Table and the 
insertion score Table resulting in an optimized (OPT) 
score. Even in the absence of knowledge of the proper 
10 reading frame of an isolated sequence, the above-described 
protein homology search may be performed by searching all 3 

reading frames. 

Peptide and protein sequence homologies can also be 
ascertained using the INHERIT 670 Sequence Analysis System 

15 in an analogous way to that used in DNA sequence 

homologies. Pattern Specification Language and parameter 
windows are used to search protein databases for sequences 
containing regions of homology which are scored with an 
initial value. Subsequent display in a dot-matrix homology 

20 plot shows regions of homology versus regions of 

repetition. Additional search tools that are available to 
use on pattern search databases include PLsearch Blocks 
(available from Henikoff & Henikoff , University of 
Washington, Seattle), Dasher and GCG. Pattern search 

25 databases include, but are not limited to, Protein Blocks 
(available from Henikoff & Henikoff, University of 
Washington, Seattle) , Brookhaven Protein (available from 
the Brookhaven National Laboratory, Brookhaven, MA), 
PROSITE (available from Amos Bairoch, University of Geneva, 

30 Switzerland) , ProDom (available from Temple Smith, Boston 
University) , and PROTEIN MOTIF FINGERPRINT (available from 
University of Leeds, United Kingdom) . 

The ABI Assembler application software, part of the 
INHERIT DNA analysis system (available from Applied 

35 Biosystems, Inc., Foster City, CA) , can be employed to 

create and manage sequence assembly projects by assembling 
data from selected sequence fragments into a larger 
sequence, The Assembler software combines two advanced 
computer technologies which maximize the ability to 
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assemble sequenced DNA fragments into Assemblages, a 
special grouping of data where the relationships between 
sequences are shown by graphic overlap, alignment and 
statistical views. The process is based on the 
5 Meyers-Kececioglu model of fragment assembly (INHERIT™ 
Assembler User's Manual, Applied Biosystems, Inc.; Foster 
City, CA) , and uses graph theory as the foundation of a 
very rigorous multiple sequence alignment engine for 
assembling DNA sequence fragments. Other assembly programs 
10 that can be used include MEGALIGN (available from DNASTAR 
Inc., Madison, WI) , Dasher and STADEN (available from Roger 
Staden, Cambridge, England; . 

Next, with reference to Fig. 2, we describe in more 
detail the "abundance sort" program which implements above- 
15 mentioned "step (b) " to tabulate the number of sequences of 
• the library which match each database entry (the "abundance 
number" for each database entry) . 

Fig. 2 is a flow chart of a preferred embodiment of 
the abundance sort program. A source code listing of this 
2 0 embodiment of the abundance sort program is set forth in 

Table 5. In the Table 5 implementation, the abundance sort 
program is written using the FoxBASE programming language 
commercially available from Microsoft Corporation. 
Although FoxBASE was the program chosen for the first 
25 iteration of this technology, it should not be considered 
limiting. Many other programming languages, Sybase being a 
particularly desirable alternative, can also be used, as 
will be obvious to one with ordinary skill in the art. The 
subroutine names specified in Fig. 2 correspond to 
30 subroutines listed in Table 5. 

With reference again to Fig. 2, the "Identified 
Sequences" are transcript sequences representing each 
sequence of the library and a corresponding identification 
of the database entry (if any) which it matches. In other 
35 words, the "Identified Sequences" are transcript sequences 
representing the output of above-discussed "step (a)." 

Fig. 3 is a block diagram of a system for implementing 
the invention. The Fig. 3 system includes library 
generation unit 2 which generates a library and asserts an 
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output stream of transcript sequences indicative of the 
biological sequences comprising the library. Programmed 
processor 4 receives the data stream output from unit 2 and 
processes this data in accordance with above-discussed 
5 "step (a)" to generate the Identified Sequences. Processor 
4 can be a processor programmed with the commercially 
available computer program known as the INHERIT 67 0 
Sequence Analysis System and the commercially available 
computer program known as the Factura program (both 
10 available from Applied Biosystems Inc.) and with the UNIX 

operating system. 

Still with reference to Fig. 3, the Identified 
Sequences are loaded into processor 6 which is programmed 
with the abundance sort program. Processor 6 generates the 

15 Final Transcript sequences indicated in both Figs. 2 and 3. 
Fig. 4 shows a more detailed block diagram of a planned 
relational computer system, including various searching 
techniques which can be implemented, along with an 
assortment of databases to query against. 

20 With reference to Fig. 2, the abundance sort program 

first performs an operation known as "Tempnum" on the 
Identified Sequences, to discard all of the Identified 
Sequences except those which match database entries of 
selected types. For example, the Tempnum process can 

25 select Identified Sequences which represent matches of the 
following types with database entries (see above for 
definition): "exact" matches, human "homologous" matches, 
"other species" matches representing genes present in 
species other than human) , "no" matches (no significant 

30 regions of homology with database en *.ries representing 
previously identified nucleotide sequences) , "I" matches 
(Incyte for not previously known DNA sequences) , or "X" 
matches (matches ESTs in reference database) . This 
eliminates the U, S, M, V, A, R and D sequence (see Table 1 

35 for definitions) . 

The identified sequence values selected during the 
"Tempnum" process then undergo a further selection (weeding 
out) operation known as "Tempred." This operation can, for 
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example, discard all identified sequence values 
representing matches with selected database entries. 

The identified sequence values selected during the 
"Tempred" process are then classified according to library, 
5 during the "Tempdesig" operation. It is contemplated that 
the "Identified Sequences" can represent sequences from a 
single library, or from two or more libraries. 

Consider first the case that the identified sequence 
values represent sequences from a single library. In this 
10 case, all the identified sequence values determined during 
"Tempred" undergo sorting in the "Templib" operation, 
further sorting in the "Libsort" operation, and finally 
additional sorting in the "Temptarsort" operation. For 
example, these three sorting operations can sort the 
15 identified sequences in order of decreasing "abundance 
number" (to generate a list of decreasing abundance 
numbers, each abundance number corresponding to a unique 
identified sequence entry, or several lists of decreasing 
abundance numbers, with the abundance numbers in each list 
20 corresponding to database entries of a selected type) with 
redundancies eliminated from each sorted list. In this 
case, the operation identified as "Cruncher" can be 
bypassed, so that the "Final Data" values are the organized 
transcript sequences produced during the "Temptarsort" 
25 operation. 

We next consider the case that the transcript 
sequences produced during the "Tempred" operation represent 
sequences from two libraries (which we will denote the 
"target" library and the "subtractant" library) . For 
30 example, the target library may consist of cDNA sequences 
from clones of a diseased cell, while the subtractant 
library may consist of cDNA sequences from clones of the 
diseased cell after treatment by exposure to a drug. For 
another example, the target library may consist of cDNA 
35 sequences from clones of a cell type from a young human, 

while the subtractant library may consist of cDNA sequences 
from clones of the same cell type from the same human at 
different ages. 
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In this case, the "Tempdesig" operation routes all 
transcript sequences representing the target library for 
processing in accordance with "Templib" (and then "Libsort" 
and "Temptarsort" ) , and routes all transcript sequences 
5 representing the subtractant library for processing in 
accordance with "Tempsub" (and then M Subsort n and 
"Tempsubsort"). For example, the consecutive "Templib, " 
"Libsort," and "Temptarsort 11 sorting operations sort 
identified sequences from the target library in order of 
10 decreasing abundance number (to generate a list of 
decreasing abundance numbers, each abundance number 
corresponding to a database entry, or several lists of 
decreasing abundance numbers, with the abundance numbers in 
each list corresponding to database entries of a selected 
15 type) with redundances eliminated from each sorted list* 
'The consecutive "Tempsub," "Subsort," and "Tempsubsort" 
sorting operations sort identified seguences from the 
subtractant library in order of decreasing abundance -number 
(to generate a list of decreasing abundance numbers, each 
2 0 abundance number corresponding to a database entry, or 
several lists of decreasing abundance numbers, with the 
abundance numbers in each list corresponding to database 
entries of a selected type) with redundancies eliminated 
from each sorted list. 
25 The transcript sequences output from the "Temptarsort" 

operation typically represent sorted lists from which a 
histogram could be generated in which position along one 
(e.g., horizontal) axis indicates abundance number (of 
target library sequences) , and position along another 
30 (e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type). Similarly, the 
transcript sequences output from the "Tempsubsort" 
operation typically represent sorted lists from which a 
histogram could be generated in which position along one 
35 (e.g., horizontal) axis indicates abundance number (of 

subtractant library sequences) , and position along another 
(e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type). 
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The transcript sequences (sorted lists) output from 
the Tempsubsort and Temptarsort sorting operations are 
combined during the operation identified as "Cruncher," 
The "Cruncher" process identifies pairs of corresponding 
5 target and subtractant abundance numbers (both representing 
the same identified sequence value) , and divides one by the 
other to generate a "ratio" value for each pair of 
corresponding abundance numbers, and then sorts the ratio 
values in order of decreasing ratio value. The data output 
10 from the "Cruncher" operation (the Final Transcript 

sequence in Fig. 2) is typically a sorted list from which a 
histogram could be generated in which position 3long one 
axis indicates the size of a ratio of abundance numbers 
(for corresponding identified sequence values from target 
15 and subtractant libraries) and position along another axis 
indicates identified sequence value (e.g., gene type). 

Preferably, prior to obtaining a ratio between the two 
library abundance values, the Cruncher operation also 
divides each ratio value by the total number of sequences 
20 in one or both of the target and subtractant libraries. 

The resulting lists of "relative" ratio values generated by 
the Cruncher operation are useful for many medical, 
scientific, and industrial applications. Also preferably, 
the output of the Cruncher operation is a set of lists, 
25 each list representing a sequence of decreasing ratio 
values for a different selected subset (e.g. protein 
family) of database entries. 

In one example, the abundance sort program of the 
invention tabulates for a library the numbers of mRNA 
30 transcripts corresponding to each gene identified in a 

database. These numbers are divided by the total number of 
clones sampled. The results of the division reflect the 
relative abundance of the mRNA transcripts in the cell type 
or tissue from which they were obtained. Obtaining this 
35 final data set is referred to herein as "gene transcript 
image analysis." The resulting subtracted data show 
exactly what proteins and genes are upregulated and 
downregulated in highly detailed complexity. 
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6,6. HUVEC CDNA LIBRARY 

Table 2 is an abundance table listing the various gene 
transcripts in an induced HUVEC library. The transcripts 
are listed in order of decreasing abundance. This 
5 computerized sorting simplifies analysis of the tissue and 
speeds identification of significant new proteins which are 
specific to this cell type. This type of endothelial cell 
lines tissues of the cardiovascular system, and the more 
that is known about its composition particularly in 
10 response to activation, the more choices of protein targets 
become available to affect in treating disorders of this 
tissue, such as the highly prevalent atherosclerosis. 

6 - 7 * MONOCYTE-CELL TlND MAST-CELL cDNA LIBRARIES 

Tables 3 and 4 show truncated comparisons of two 

15 libraries* In Tables 3 and 4 the "normal monocytes 11 are 
the HMC-l cells, and the "activated macrophages" are the 
THP-1 cells pretreated with PMA and activated with LPS. 
Table 3 lists in descending order of abundance the most 
abundant gene transcripts for both cell types. With only 

20 15 gene transcripts from each cell type, this table permits 
quick, qualitative comparison of the most common 
transcripts. This abundance sort, with its convenient 
side-by-side display, provides an immediately useful 
research tool. In this example, this research tool 

25 discloses that 1) only one of the top 15 activated 
macrophage transcripts is found in the top 15 normal 
monocyte gene transcripts (poly A binding protein); and 2) 
a new gene transcript (previously unreported in other 
databases) is relatively highly represented in activated 

3 0 macrophages but is not similarly prominent in normal 

macrophages. Such a research tool provides researchers 
with a short-cut to new proteins, such as receptors, cell- 
surface and intracellular signalling molecules, which can 
serve as drug targets in commercial drug screening 

35 programs. Such a tool could save considerable time over 
that consumed by a hit and miss discovery program aimed at 
identifying important proteins in and around cells, because 
those proteins carrying out everyday cellular functions and 
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represented as steady state mRNA are quickly eliminated 
from further characterization. 

This illustrates how the gene transcript profiles 
change with altered cellular function. Those skilled in 
5 the art know that the biochemical composition of cells also 
changes with other functional changes such as cancer, 
including cancer's various stages, and exposure to 
toxicity. A gene transcript subtraction profile such as in 
Table 3 is useful as a first screening tool for such gene 
10 expression and protein studies ♦ 

6,8. SUBTRACTION ANALYSIS OF NORMAL MONOCYTE-CELL AND 
ACTIVATED MONOCYTE CELL cDNA LIBRARIES 

Once the cDNA data are in the computer, the computer 

program as disclosed in Table 5 was used to obtain ratios 

15 of all the gene transcripts in the two libraries discussed 
in Example 6.7, and the gene transcripts were sorted by the 
descending values of their ratios. If a gene transcript is 
not represented in one library, that gene transcript's 
abundance is unknown but appears to be less than 1. As an 

20 approximation — and to obtain a ratio, which would not be 
possible if the unrepresented gene were given an abundance 
of zero — genes which are represented in only one of the 
two libraries are assigned an abundance of 1/2. Using 1/2 
for unrepresented clones increases the relative importance 

25 of "turned-on" and " turned-of f *' genes, whose products would 
be drug candidates. The resulting print-out is called a 
subtraction table and is an extremely valuable screening 
method, as is shown by the following data. 

Table 4 is a subtraction table, in which the normal 

30 monocyte library was electronically "subtracted" from the 
activated macrophage library. This table highlights most 
effectively the changes in abundance of the gene 
transcripts by activation of macrophages. Even among the 
first 20 gene transcripts listed, there are several unknown 

35 gene transcripts. Thus, electronic subtraction is a useful 
tool with which to assist researchers in identifying much 
more quickly the basic biochemical changes between two cell 
types. Such a tool can save universities and 
pharmaceutical companies which spend billions of dollars on 
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research valuable time and laboratory resources at the 
early discovery stage and can speed up the drug development 
cycle, which in turn permits researchers to set up drug 
screening programs much earlier. Thus, this research tool 
5 provides a way to get new drugs to the public faster and 
more economically . 

Also, such a subtraction table can be obtained for 
patient diagnosis. An individual patient sample (such as 
monocytes obtained from a biopsy or blood sample) can be 
10 compared with data provided herein to diagnose conditions 
associated with macrophage activation* 

Table 4 uncovered many new gene transcripts (labeled 
Incyte clones) . Note that many genes are turned on in the 
activated macrophage (i.e., the monocyte had a 0 in the 
15 bgfreq column) . This screening method is superior to other 
screening techniques, such as the western blot, which are 
incapable of uncovering such a multitude of discrete new 
gene transcripts. 

The subtraction-screening technique has also uncovered 
20 a high number of cancer gene transcripts (oncogenes rho, 
ETS2, rab-2 ras, YPTl-related , and acute myeloid leukemia 
mRNA) in the activated macrophage. These transcripts may 
be attributed to the use of immortalized cell lines and are 
inherently interesting for that reason. This screening 
25 technique offers a detailed picture of upregulated 

transcripts including oncogenes, which helps explain why 
anti-cancer drugs interfere with the patient's immunity 
mediated by activated macrophages. Armed with knowledge 
gained from this screening method, those skilled in the art 
30 can set up more targeted, more effective drug screening 
programs to identify drugs which are differentially 
effective against 1) both relevant cancers and activated 
macrophage conditions with the same gene transcript 
profile; 2) cancer alone; and 3) activated macrophage 
35 conditions* 

Smooth muscle senescent protein (22 kd) was 
upregulated in the activated macrophage, which indicates 
that it is a candidate to block in controlling 
inflammation. 
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6,9. SUBTRACTION ANALYSIS OP NORMAL LIVER CELLS AND 
HEPATITIS INFECTED LIVER CELL cDNA LIB RARIES 

In this example, rats are exposed to hepatitis virus 
and maintained in the colony until they show definite signs 
5 of hepatitis. Of the rats diagnosed with hepatitis, one 
half of the rats are treated with a new anti-hepatitis 
agent (AHA) . Liver samples are obtained from all rats 
before exposure to the hepatitis virus and at the end of 
AHA treatment or no treatment. In addition, liver samples 
10 can be obtained from rats with hepatitis just prior to AHA 
treatment . 

The liver tissue is treated as described in Examples 
6.2 and 6.3 to obtain mRNA and subsequently to sequence 
cDNA. The cDNA from each sample are processed and analyzed 

15 for abundance according to the computer program in Table 5. 
The resulting gene transcript images of the cDNA provide 
detailed pictures of the baseline (control) for each animal 
and of the infected and/or treated state of the animals. 
cDNA data for a group of samples can be combined into a 

20 group summary gene transcript profile for all control 

samples, all samples from infected rats and all samples 
from AHA-treated rats. 

Subtractions are performed between appropriate 
individual libraries and the grouped libraries. For 

25 individual animals, control and post-study samples can be 

subtracted. Also, if samples are obtained before and after 
AHA treatment, that data from individual animals and 
treatment groups can be subtracted. In addition, the data 
for all control samples can be pooled and averaged. The 

30 control average can be subtracted from averages of both 
post-study AHA and post-study non-AHA cDNA samples. If 
pre- and post-treatment samples are available, pre- and 
post-treatment samples can be compared individually (or 
electronically averaged) and subtracted. 

35 These subtraction tables are used in two general ways. 

First, the differences are analyzed for gene transcripts 
which are associated with continuing hepatic deterioration 
or healing. The subtraction tables are tools to isolate 
the effects of the drug treatment from the underlying basic 

40 pathology of hepatitis. Because hepatitis affects many 
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parameters, additional liver toxicity has been difficult to 
detect with only blood tests for the usual enzymes • The 
gene transcript profile and subtraction provides a much 
more complex biochemical picture which researchers have 
5 needed to analyze such difficult problems ♦ 

Second, the subtraction tables provide a tool for 
identifying clinical markers, individual proteins or other 
biochemical determinants which are used to predict and/or 
evaluate a clinical endpoint, such as disease, improvement 

10 due to the drug, and even additional pathology due to the 
drug. The subtraction tables specifically highlight genes 
which are turned on or off. Thus, the subtraction tables 
provide a first screen for a set of gene transcript 
candidates for use as clinical markers. Subsequently, 

15 electronic subtractions of additional cell and tissue 

libraries reveal which of the potential markers are in fact 
found in different cell and tissue libraries. Candidate 
gene transcripts found in additional libraries are removed 
from the set of potential clinical markers. Then, tests of 

20 blood or other relevant samples which are known to lack and 
have the relevant condition are compared to validate the 
selection of the clinical marker. In this method, the 
particular physiologic function of the protein transcript 
need not be determined to qualify the gene transcript as a 

25 clinical marker. 

6- 1 0. ELECTRONIC NORTHERN BLOT 
One limitation of electronic subtraction is that it is 
difficult to compare more than a pair of images at once. 
Once particular individual gene products are identified as 

30 relevant to further study (via electronic subtraction or 
other methods) , it is useful to study the expression of 
single genes in a multitude of different tissues. In the 
lab, the technique of "Northern" blot hybridization is used 
for this purpose. In this technique, a single cDNA, or a 

35 probe corresponding thereto, is labeled and then hybridized 
against a blot containing RNA samples prepared from a 
multitude of tissues or cell types. Upon autoradiography, 
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the pattern of expression of that particular gene, one at a 
time, can be quantitated in all the included samples. 

In contrast, a further embodiment of this invention is 
the computerized form of this process, termed here 
5 "electronic northern blot." In this variation, a single 
gene is queried for expression against a multitude of 
prepared and sequenced libraries present within the 
database. In this way, the pattern of expression of any 
single candidate gene can be examined instantaneously and 

10 effortlessly. More candidate genes can thus be scanned, 
leading to more frequent and fruitfully relevant 
discoveries. The computer program included as Table 5 
includes a program for performing this function, and Table 
6 is a partial listing of entries of the database used in 

15 the electronic northern blot analysis. 

6-11. PHASE I CLINICAL TRIALS 
Based on the establishment of safety and effectiveness 
in the above animal tests, Phase I clinical tests are 
undertaken. Normal patients are subjected to the usual 

20 preliminary clinical laboratory tests. In addition, 
appropriate specimens are taken and subjected to gene 
transcript analysis. Additional patient specimens are 
taken at predetermined intervals during the test. The 
specimens are subjected to gene transcript analysis as 

25 described above. In addition, the gene transcript changes 
noted in the earlier rat toxicity study are carefully 
evaluated as clinical markers in the followed patients. 
Changes in the gene transcript analyses are evaluated as 
indicators of toxicity by correlation with clinical signs 

30 and symptoms and other laboratory results. In addition, 
subtraction is performed on individual patient specimens 
and on averaged patient specimens. The subtraction 
analysis highlights any toxicological changes in the 
treated patients. This is a highly refined determinant of 

35 toxicity. The subtraction method also annotates clinical 
markers. Further subgroups can be analyzed by subtraction 
analysis, including, for example, 1) segregation by 
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occurrence and type of adverse effect; and 2) segregation 
by dosage. 

6.12. GENE TRANSCRIPT IMAGING ANALYSIS IN CLINICAL STUDIES 

A gene transcript imaging analysis (or multiple gene 
5 transcript imaging analyses) is a useful tool in other 
clinical studies. For example, the differences in gene 
transcript imaging analyses before and after treatment can 
be assessed for patients on placebo and drug treatment. 
This method also effectively screens for clinical markers 
10 to follow in clinical use of the drug. 

6.X3. COMPARATIVE GENE TRANSCRIPT ANALYSIS BETWEEN SPECIES 

The subtraction method can be used to screen cDNA 
libraries from diverse sources. For example, the same cell 
types from different species can be compared by gene 

15 transcript analysis to screen for specific differences, 
such as in detoxification enzyme systems. Such testing 
aids in the selection and validation of an animal model for 
the commercial purpose of drug screening or toxicological 
testing of drugs intended for human or animal use. When 

20 the comparison between animals of different species is 

shown in columns for each species, we refer to this as an 
interspecies comparison , or zoo blot . 

Embodiments of this invention may employ databases 
such as those written using the FoxBASE programming 

25 language commercially available from Microsoft Corporation. 
Other embodiments of the invention employ other databases, 
such as a random peptide database, a polymer database, a 
synthetic oligomer database, or a oligonucleotide database 
of the type described in U.S. Patent 5,270,170, issued 

30 December 14, 1993 to Cull, et al., PCT International 

Application Publication No. WO 9322684, published November 
11, 1993, PCT International Application Publication No. WO 
9306121, published April 1, 1993, or PCT International 
Application Publication No. WO 9119818, published December 

35 26, 1991. These four references (whose text is 

incorporated herein by reference) include teaching which 
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may be applied in implementing such other embodiments of 
the present invention. 

All references referred to in the preceding text are 
hereby expressly incorporated by reference herein, 
5 Various modifications and variations of the described 

method and system of the invention will be apparent to 
those skilled in the art without departing from the scope 
and spirit of the invention* Although the invention has 
been described in connection with specific preferred 
10 embodiments, it should be understood that the invention as 
claimed should not be unduly limited to such specific 
embodiments. 
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TABLE 2 



Clone numbers 15000 through 20000 

Libraries: HUVEC 

Arranged by ABUNDANCE 

Total clones analyzed: 5000 



319 


genes, for 


a total of 


1713 Clones 






number 


N c 


entry s 


descriptor 


1 


15365 


67 


HSRPL41 


Riboptn L41 


2 


15004 


65 


NCY015004 


INCYTE 015004 


3 


15638 


63 


NCY015638 


INCYTE 015638 


4 


15390 


50 


NCY015390 


INCYTE 015390 


5 


15193 


47 


HSFIB1 


Fibronectin 


6 


15220 


47 


RRRPL9 R 


Riboptn L9 


7 


15280 


47 


NCY015280 


INCYTE 0152 80 


8 


15583 


33 


M62060 


EST HHCH09 (ICR) 


9 


15662 


31 


HSACTCGR 


Actin, gamma . 


10 


15026 


29 


NCY015026 


INCYTE 015026 


11 


15279 


24 


HSEF1AR 


Elf 1-alpha 


12 


15027 


23 


NCY015027 


INCYTE 015027 


13 


15033 


20 


NCY015033 


INCYTE 015033 


14 


15198 


20 


NCY015198 


INCYTE 015198 


15 


15809 


20 


HSCOLL1 


Collagenase 


16 


15221 


19 


NCY015221 


INCYTE 015221 


17 


15263 


19 


NCY015263 


INCYTE 015263 


18 


15290 


19 


NCY015290 


INCYTE 015290 


19 


15350 


18 


NCY015350 


INCYTE 015350 


20 


15030 


17 


NCY015030 


INCYTE 015030 


21 


15234 


17 


NCY015234 


INCYTE 015234 


22 


15459 


16 


NCY015459 


INCYTE 015459 


23 


15353 


15 


NCY015353 


INCYTE 015353 


24 


15378 


15 


S76965 


Ptn kinase inhib 


25 


15255 


14 


HUMTHYB4 


Thymosin beta-4 


26 


15401 


14 


HSLIPCR 


Lipocortin I 


27 


15425 


14 


HSPOLYAB 


Poly-A bp 


28 


18212 


14 


HUMTHYMA 


Thymosin, alpha 


29 


18216 


14 


HSMRP1 


Motility relat ptn; MRP-l;CD-9 


30 


15189 


13 


^S18D 


Interferon indue ptn 1-8D 


31 


15031 


12 


hUMFKBP 


FK506 bp 


32 


15306 


12 


HSH2AZ 


Histone H2A 


33 


15621 


12 


HUMLEC 


Lectin, B-galbp, 14kDa 


34 


15789 


11 


NCY015789 


INCYTE 015789 


35 


16578 


11 


HSRPS11 


Riboptn Sll 


36 


16632 


11 


M61984 


EST HHCA13 (ICR) 


37 


18314 


11 


NCY018314 


INCYTE 018314 


38 


15367 


10 


NCY015367 


INCYTE 015367 


39 


15415 


10 


HSIFNIN1 


interferon indue mRNA 


40 


15633 


10 


KSLDHAR 


Lactate dehydrogenase 


41 


15813 


10 


CHKNMHCB 


C Myosin heavy chain B 


42 


18210 


10 


NCY018210 


INCYTE 018210 


43 


18233 


10 


HSRPII140 


RNA polymerase II 


44 


18996 


10 


NCY018996 


INCYTE 018996 


45 


15088 


9 


HUMFERL 


Ferritin, light chain 


46 


15714 


9 


NCY015714 


INCYTE 015714 


47 


15720 


9 


NCY015720 


INCYTE 015720 


48 


15863 


9 


NCY015863 


INCYTE 015863 


49 


16121 


9 


HSET 


Endothelin 


50 


18252 


9 


NCY018252 


INCYTE 018252 


51 


15351 


8 


HUMALBP 


Lipid bp, adipocyte 


52 


15370 


8 


NCY015370 


INCYTE 015370 
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TABLE 2 Con't 





number 


N 


c 


entry 


s 


descriptor 


53 


15670 


8 




BTCIASHI 


V 


NADH-ubiq oxidoreductase 


54 


15795 


8 




NCY015795 




INCYTE 015795 


55 


16245 


8 




NCY016245 




INCYTE 016245 


56 


18262 


8 




NCY018262 




INCYTE 018262 


57 


18321 


8 




HSRPL17 




Riboptn L17 


58 


15126 


7 




XLRPL1BRF 




Riboptn LI 


59 


15133 


7 




HSAC07 




Actin, beta 


60 


15245 


7 




NCY015245 




INCYTE 015245 


61 


15288 


7 




NCY015288 




INCYTE 015288 


62 


15294 


7 




HSGAPDR 




G-3-PD 


63 


15442 


7 




HUMLAMB 




Laminin receptor, 54kDa 


64 


15485 


7 




HSNGMRNA 




Uracil DNA glycosylase 


65 


16646 


7 




NCY016646 




INCYTE 016646 


66 


18003 

_k , w »rt* 


7 




HUMPAIA 




Plsmnogen activ gene 


67 


15032 


6 




HUMUB 




Ubiquitin 


68 


15267 


6 




HSRPS8 




Riboptn S8 


69 


1529 5 


6 




NCY015295 




INCYTE 015295 


70 


15458 

«1» w TT W 


6 




RNRPS10R 


R 


Riboptn S10 


7 1 


15832 


6 




RSGALEM 


R 


UDP-galactose epimerase 


7 ? 


15928 


6 




HUMAPOJ 




Apolipoptn J 

~ C ST 




1 6598 


6 

w 




HUMTBBM40 




Tubulin, beta 


74 


X w «• J. 


6 

w 




NCY0182 18 

m T *h- \^ rth "tart> Art rtk '■rt' 




INCYTE 018218 


75 


18499 


6 




HSP2 7 




Hydrophobic ptn p27 


7 6 


18963 


6 




NCY018963 




INCYTE 018963 


77 


18997 


6 




NCY018997 




INCYTE 018997 


78 


15432 

X W T W 


5 




H5AGALAR 




Galactosidase A, alpha 


79 


15475 

rti» ^ ~ f V 


5 




NCY015475 




INCYTE 015475 


80 


1572 1 


5 




NCY015721 




INCYTE 015721 


81 


15865 


5 




NCY015865 




INCYTE 015865 


82 


16270 


5 




NCY016270 




INCYTE 016270 


83 


16886 


5 




NCY016886 




INCYTE 016886 


84 


18500 


5 




NCY018500 




INCYTE 018500 


85 


18503 


5 




NCY018503 




INCYTE 018503 


86 


19672 


5 




RRRPL34 


R 


Riboptn L34 


87 


15086 


4 




XLRPL1AR 


F 


Riboptn Lla 


88 


15113 


4 




HUMIFNWRS 




tRNA synthetase, trp 


89 


15242 


4 




NCY015242 




INCYTE 015242 


90 


15249 


4 




NCY015249 




INCYTE 015249 


91 


15377 


4 




NCY015377 




INCYTE 015377 


92 


15407 


4 




NCY015407 




INCYTE 015407 


93 


15473 


4 




NCY015473 




INCYTE 015473 


94 


15588 


4 




HSRPS12 




Riboptn S12 


95 


15684 


4 




HSEF1G 




Elf 1 -gamma 


96 


15782 


4 




NCY015782 




INCYTE 015782 


97 


15916 


4 




HSRPS18 




Riboptn S18 


98 


15930 


4 




NCY015930 




INCYTE 015930 


99 


16108 


4 




NCY016108 




INCYTE 016108 


100 


16133 


4 




NCY016133 




INCYTE 016133 



* 



WO 95/20681 



PCT/US95/01160 



>< 

111 t 

■ MM 

EL 

LlI 
-J I— 

<< 

OU 





LU 

H 
< 
> 

h" 
U 
< 



< 

o 

!2b 



o 
o 



4MI 

«Q c 

^ f «^ 

-2 o 



4) u c; 



0) 

DjO ro 

O -s 

o 

rt u 

2 « o 
Eg* 

r w ^ 



c 

"53 



u o 

CU i 
p Ci- 

13 :s 

u ? 

cu ^ 



£ Z £ JTEu cq oc a- 



OL 

MnMU | ] 

o ij 

-L. II 

a. 

<u .2 

♦ . O 

•E u 

^ z 

o" o 
OH 



CO 



O -o 

MM 



o 

o c -o 

p c ^ 

V) «£ i 

<u y .2 
" "So 



x 
o 

Q, 
CO 



CU 
C 

o 

u 

LU 



c 
N 



_ u 



u eo - j 

CU 

■ « is 
■y — <y 

" fl) eg 

■sS! 

< Z a. 



O 

r3 a> 5 

Gl O 

o o 



&0 

o 
o 

o 



.2 c 

c3 o 

O ^ 
LU HE 



o 



c o 



£ O 

v o 5 

<U Q. U 



O 

£ 

0) u 

— M 



a* 



.^b. ^^^^ h *HH 



£-2 

oO 

to I 

rtz <U 

cq 



12-2 



O 

CO 

O 



5. E 

o 

_ to 



C 

o 



o 

CU 



c 



CO 



a. 

•£ c 
*a5 o 

a. 

w X 

<u 

CU 

— c 

fCJ O 



£ 
o 

CO 

O 



(/5 x 



£ 

"33 
o 

13 

£ 
o 

CO 

O 
SI 



WO 95/20681 



PCT/US95/01160 



TABLE 4 



Libraries: THP-1 

Subtracting: HMC 

Sorted by ABUNDANCE 

Total clones analyzed: 7375 



1057 genes, 


for a total 


of 2151 clones 








number 


entry s 


descriptor 


bgf req 


rf end 


ratio 


10022 


HUMIL1 


IL 1-beta 


0 


131 


262.00 


10036 


HSMDNCF 


IL-8 


0 


119 


238.00 


10089 


HSLAG1CDN 


Lymphocyte activ gene 


0 


71 


142.00 


10060 


HUMTCSM 


RANTES 


0 


23 


46.000 


10003 


HUMMIP1A 


MIP-1 


3 


121 


40.333 


10689 


HSOP 


Osteopont in 


0 


20 


40.000 


1 1050 

_L O* V/ W 


NCY011050 


INCYTE 011050 


0 


17 


34.000 


10937 


HSTNFR 


TNF-alpha 


0 


17 


34.000 


10176 


HSSOD 


Superoxide dismutase 


0 


14 


28.000 


10886 

\y w w w 


HSCDW40 


B-cell activ, NGF-relat 


0 


10 


20.000 


10186 


HUMAPR 


Early resp PMA-induc 


0 


9 


18.000 


10967 


HUMGDN 


PN-1, glial-deriv 


0 


9 


18.000 


11353 


NCY011353 


INCYTE 011353 


0 


8 


16.000 


10298 


NCY010298 


INCYTE 010298 


0 


7 


14.000 


10215 


HUM4COLA 


Collagenase, type IV 


0 


6 


12.000 


10276 


NCY010276 


INCYTE 010276 


0 


6 


12.000 


10488 


NCY010488 


INCYTE 010488 


0 


6 


12.000 


11138 


NCY011138 


INCYTE 011138 


0 


6 


12.000 


10037 


HUMCAPPRO 


Adenylate cyclase 


1 


10 


10.000 


10840 


HUMADCY 


Adenylate cyclase 


0 


5 


10.000 


10672 


HSCD44E 


Cell adhesion glptn 


0 


5 


10.000 


12837 


HUMCYCLOX 


Cyclooxygenase-2 


0 


5 


10.000 


10001 


NCY010001 


INCYTE 010001 


0 


5 


10.000 


10005 


NCY010005 


INCYTE 010005 


0 


5 


10.000 


10294 


NCY010294 


INCYTE 010294 


0 


5 


10.000 


10297 


NCY010297 


INCYTE 010297 


0 


5 


10.000 


10403 


NCY010403 


INCYTE 010403 


0 


5 


10.000 


10699 


NCY010699 


INCYTE 010699 


0 


5 


10.000 


10966 


NCY010966 


INCYTE 010966 


0 


5 


10.000 


12092 


NCY012092 


INCYTE 012092 


0 


5 


10.000 


12549 


HSRHOB 


Oncogene rho 


0 


5 


10.000 


10691 


HUMARF1BA 


ADP-ribosylation fctr 


0 


4 


8.000 


12106 


HSADSS 


Adenylosuccinate synthetase 


0 


4 


8.000 


10194 


HSCATHL 


Cathepsin L 


0 


4 


8.000 


10479 


CLMCYCA I 


Cyclin A 


0 


4 


8-000 


10031 


NCY010031 


INCYTE 010031 


0 


4 


8.000 


10203 


NCY010203 


INCYTE 010203 


0 


4 


8.000 


10288 


NCY010288 


INCYTE 010288 


0 


4 


8.000 


10372 


NCY01037'' 


INCYTE 010372 


0 


4 


8.000 


10471 


NCY010471 


INCYTE 010471 


0 


4 


8.000 


10484 


NCY010484 


INCYTE 010484 


0 


4 


8.000 


10859 


NCY010859 


INCYTE 010859 


0 


4 


8.000 


10890 


NCY010890 


INCYTE 010890 


0 


4 


8.000 


11511 


NCY011511 


INCYTE 011511 


0 


4 


8,000 


11868 


NCY011868 


INCYTE 011868 


0 


4 


8.000 


12820 


NCY012820 


INCYTE 012820 


0 


4 


8.000 


10133 


HSI1RAP 


IL-1 antagonist 


0 


4 


8.000 


10516 


HUMP2A 


Phosphatase, regul 2A 


0 


4 


8.000 


11063 


HUMB94 


TNF-induc response 


0 


4 


8.000 


11140 


HSHB15RNA 


HB15 gene; new Ig 


0 


3 


6.000 


10788 


NCY001713 


INCYTE 001713 


0 


3 


6.000 


10033 


NCY010033 


INCYTE 010033 


0 


3 


6.000 


10035 


NCY010035 


INCYTE 010035 


0 


3 


6.000 


10084 


NCY010084 


INCYTE 010084 


0 


3 


6.000 


10236 


NCY010236 


INCYTE 010236 


0 


3 


6.000 


10383 


NCY010383 


INCYTE 010383 


0 


3 


6.000 
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TABLE 4 Con't 



number 


entry 


s descriptor 


bgf req 


r f Grid, 


ratio 


10450 


NCY010450 


INCYTE 


010450 


0 


3 


6 . 000 


10470 


NCY010470 


INCYTE 


010470 


0 


3 


6. 000 


10504 


NCY010504 


INCYTE 


010504 


0 


3 


6 . 000 


10507 


NCY010507 


INCYTE 


010507 


0 


3 


6 ♦ 000 


10598 


NCY010598 


INCYTE 


010598 


0 


3 


o . 000 


f» n **f r> A 

10779 


NCY010779 


INCYTE 


010779 


0 


3 


6 . 000 


10909 


NCY0109 09 


INCYTE 


010909 


0 


3 


o * 000 


10976 


NCY0109 7 6 


INCYTE 


010976 


0 


3 


6 - 000 


10985 


NCY010985 


INCYTE 


010985 


0 


3 


6 . 000 


11052 


NCY011052 


INCYTE 


011052 


0 


3 


6. 000 


11068 


NCY011068 


INCYTE 


011068 


0 


3 


6.000 


11134 


NCY011134 


INCYTE 


011134 


0 


3 


6*000 


11136 


NCY011136 


INCYTE 


011136 


0 


3 


6.000 


11191 


NCY0U191 


INCYTE 


011191 


0 


3 


6. 000 


11219 


NCY011219 


INCYTE 


011219 


0 


3 


6. 000 


11386 


NCY011386 


INCYTE 


011386 


0 


3 


6. 000 


11403 


NCY011403 


INCYTE 


011403 


0 


3 


6.000 


11460 


NCY011460 


INCYTE 


011460 


0 


3 


6. 000 


11618 


NCY011618 


INCYTE 


011618 


0 


3 


6.000 


11686 


KCY011686 


INCYTE 


011686 


0 


3 


6.000 


12021 


NCY012021 


INCYTE 


012021 


0 


3 


6.000 


12025 


NCY012025 


INCYTE 


012025 


0 


3 


6.000 


12320 


NCY012320 


INCYTE 


012320 


0 


3 


6.000 


12330 


NCY012330 


INCYTE 


012330 


0 


3 


6.000 


12853 


NCY012853 


INCYTE 


012853 


0 


3 


6.000 


14386 


NCY014386 


INCYTE 


014386 


0 


3 


6.000 


14391 


NCY014391 


INCYTE 


014391 


0 


3 


6.000 
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TABLE 5 ; 



* Master menu for SUBTRACTION output 
SET TALK OFF 

SET SAFETY OFF 
SET EXACT CEN 
SET TYFEAHE&D TO 0 
CLEAR 

SET DEVICE TO SCREEN 

USE • * SrartGuy : F0-XBAS2* /Mac ; fox £ ilea : Clones .dbf* 
00 TOP ' 

STORE NUMBER TO INITIATE 
GO B0TTC& 

STCPS NUMBER TO TERMINATE 
STORE ' 1 TO Tzirgetl 

STORE * ' TO Target2 

STORE ' 'TO Target: 3 

STORE. * ' TO objectl 

STORE 1 'TO Object 2 

STORE • ' TO 0bject3 

STORE 0 TO ANAL ' 
STORE 0 TO EMATCH 
STORE 0 TO HMATCH 
STORE 0 TO CWATCH 
STORE 0 TO IMATCH 
STORE 0 TO .PTF 
STORE 1 TO BAIL- 
DO WHILE .T# ' 

* 'Program. : 'Subtraction 2.fntt 
'* Date,,*, : 10/11/94 

* Version* t Fo*EASE+/Kac, revision 1*10 

* Notes . . '. . : Format file Subtraction 2 
.* 

SCREEN 1 TYPE 0 HEADING "Screen 1* AT 40,2 SI2E 286,492 PIXELS PONT •Genevas 9 COLOR 0/0,0/ 

3 PIXELS 75,120 TO 178,241 SmE 3871 COtOR 0,0,-1,24610,-1,8947 

5 PIXELS 27 # 134 SAY "Subtraction Menu* STYLE 65536 PONT * Geneva *, 274 COLOR 0, 0, -1,.-1, -1, -1 

9 PIXELS 117,126 GET BftTCH STYLE 65536 FONT 'Chicago' ; 12 PICIORE "<3*C Exact ' SIZE 15,62 CO 
8 'PIXELS 135 1 126 GET HMATCK -STYLE 65536 FONT ■Chicago" ,12 .PICTURE 'Q*C Homologous SIZE .15,1 
@ PIXELS 153,126 GET GMATCH STVLE 65S36 PONT "Chicago* , 12 PICTURE m e*C Other epe a SIZE 15,84 
8 PIXELS 90,152 SAY "Matches: STYLE 65536 PONT "Geneva 1 , 12 COLOR 0, 0, rl, -1, -1, -1 
@ PIXELS 171,126 GET Imatcfa STYLE 65536 PONT "Chicago M2 PICTURE »<3*C Inqyte' SIZE 15,65 CO 
<? PIXELS 252 , 137 GET initiate STYLE 0 FONT "Geneva" , 12 SIZE 15,70 COLOR 0,0, -1, -1, -1,-1 
a PIXELS 252,236 GET teriftinate STYLE 0 FONT •Geneva", 12 SIZE 15,70 COLOR 0,0,-1,-1,-1,-1 
d PIXELS 252,35 SAY "include Clones '" STYLE 65536 FONT "Geneva", 12 COLOR 0/0^-1,-1,-1,-1 

6 PIXELS* 252,215 £*\Y '->" STYLE 65536 FONT 'Geneva", 14 COLOR 0,0,-1,-1,-1,-1 - 

5 PIXELS '198,126 GET PTF STYLE 65536 FONT "Chicago* , 12 PICTURE "$*C -Print to file" SIZE 15,9 
<a-piXELS 90,9 TO 181,109 STYLE 3871 COLOR 0,0,-1,-25500,-1,-1 

a PIXELS 90,288 TO f 181,397 STYLE 3871 COLOR 0,0,-1,-25600,-1,-1 

6 PIXELS 81,296 SAY "Background:* STYLE 65536 FONT "Geneva" ,270 COIOR 0,0,-1,-1,-1,-1. 

Q PIXELS 45,135 GET ANAL STYLE 55536 FONT 'Chicago - ,,12 PICTURE "G^R Overall; Function" SIZE 4 

6 PIXELS 81, Se SAY "Target:" STYLE 65S36 FONT "Geneva" ,270 COLOR 0,0,-1,-1,-1,-1 

8 PIXELS 108,20 GET target! STYLE 0 PCMT "Geneva"', 9 SIZE 12,79 COLOR 0, 0, rl, -1, -1, -1 

Q PIXELS 13S,20 GET targ«t2 STYLE 0 FONT "Geneva", 9 SIZE 12,79 COLOR 0,0,-1,-1,-1,-1 

4 PIXELS 162,20 GET target3 STYLE 0 FCNT *Geneva«.,9 SIZE 12,79 COLOR 0,0,-1,-1,-1,-1 
8 PIXELS 108,299 O^ST objectl STYLE 0 FCNT 'Geneva*, 9 SIZE 12,79 COLOR 0,0,-1,-1,-1,-1 
3 PIXELS 135,299 GET object2 STYLE 0. FONT "Geneva", 9 SIZE 12,79 COLOR 0,0, ~ J- -1,-1,-1 
8 PIXELS 162,299 GET object3 STYLE 0 FCNT "Geneva", 9 SIZE 12,79 COLOR 0,0,-1,-1,-1,-1 

PIXELS 276,324'GET Bail STYLE 65536 FOOT 'Chicago", 12 PICTURE "8*R Kun;BaU out" SIZE 4112 

* 

* EOFs Subtraction . 2 . fmt 
READ ■ 

IF Bail=2 
CLEAR 

CLOSE DATABASES 

USE "SmartGuysFoxBASE+ZMaCifox f iles : clones, dbf" 

.SET SAFETY ON 
SCREEN. 1 OFF 
RETURN 
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ENDIF 

STORE VAL(5YS (2)) TO STARTIME 
STORE upper (Target! ). TO Targetl 
STORE . UPPER (Target 2 ) TO Target2 
STORE UPPER {Target3 ) TO Targets 
STORE UPPBR(ObjGctl) TO Objectl 
STORE UPPER ( Ob j ect2 ) TO ■ Ob j ect 2 
STOKE UPPER (Object 3) TO Objects 
clear 

SET TALK ON 

d GAP a TERMINATE-INITIATE+1 

^ GO XNTTTATF 

COPT NEXT GAP FIELDS NUM3ER, library, D, F, 2, R, ENTRX, S,J^CRIPTOR, START, RFEND, X TO TEMPNUM 

USE TKKFWUM 
W COUNT TO TOT 

§ COPY TO TEMPRED FOR Ds'E 1 .0R<D= '0* ,OR.D= 'H' .OR,D^ 4 N' .QR,D»' 1 1 

^ USE TEMPRED 

r 

IF EmatcruO ,AND. Hraacch^O .AND. Omatch=0 .AND. IKATCH-O 

COPY TO TEMPDESIG 

ELSE 

: " COPY STRUCTURE TO TEMPDESIG 

USB TEMPDESIG 
IF Ercatch*>l 

K APPEND FROM TEMPNUM FOR 'E ' 

END IF 

IF'Km&ech=l. 

APPEND FROM TEMPNUM FOR D='H' 
ENDTF 

IF Crcatch=l 

APPEND FRfttt TEMFNUM FOR D='0' 
ENDIF 

IF Xrratch*! 

APPEND FROM TEMPNUM FOR D= 'I ' .OR.D- 'X' 

'1 . ENDIF 

SNDIF 

COUNT TO STARTOT 

COPY STRUCTURE TO TEMPLIB 
USE TEMPLIB ... 

append from tempdesig FOR library^UFPER (targetl J 

IF fcaafgeba<> ! * 

APPEND FROM TEMPDESIG FOR library =UPPER ( target 2 ) 
ENDlF 

IF ts.2rg£fc3<^' 1 

APPEND FROM. TEMPDESIG FOR library«UPPER (target3 ) 

ENDXF 
COUNT TO ANALfTOT 

USE TEMPDESIG 

COPY SlOTCTURE TO TEMP SUB 

APPEND FROM TEMPDESIG FOR libraiy=UPPER(Obj«ctl) 
IF fcargefe2<>' ' 

APPEND FROM TEMPDESIG FOR. library=UPPER (Cbj CCt2 ) 
ENDIF 

IF t&rgst3o f 

•APPEND FROM TEMPDESIG FOR library=UPPER(Cbject3 ) 

ENDIF 
COUNT TO SUBTRACTOT 
SET TALK OFF 



+ COMPRESSION SUBROUTINE A 
7 'COMPRESSING' QUERY LIBRARY' 
USE TEMPLIB 
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SORT ON 1 ENTRY , NUMBER TO LlBSORT 

USE LIBSORT 

COUNT TO IDGEttE 

REPLACE ALL RFEND WITH 1 

MARX1 o 1 

SW2<=0 

00 WHILE SW3«0 ROLL 
IF MAKK1 >» IDGENE 
PACK 

COUNT TO AUNIQUE 

LOOP 

END IT 
GO MARX1 
D^p - 1 

ST0R3 ENTRY TO TESTA 
STORE D TO DSSIGA . 
SW » 0 

'CO WHILE SW=0 .TEST 
SKIP 

STORE ENTRY TO TESTE 
STORE D TO D2SIGB 

IF TESTA = TESTB . AMD , OSS XGA»DESIGB 

DELETE 

DUP x DUP+1 

LOOP 

GO'WARXl 

REPLACE RFEND WITH IXJP 

idARja « MARia+txrp 

SWsl 
LOOP 

ENDDO. TEST 
LOOP 

ENDDO BOLL 

SORT ON RFEND/D, NUMBER TO TEMPTARSORT. 
USE TEMPTARSORT 

♦REPLACE AI*L START WITH RFEMD/IDGSNS*10OOO 
COUNT TO TEMPTAKCO 

* CCMPRESSICN SUBROUTINE B 

? 'COKPKSSSD^ TARGET LI BRAKY* 

USE TEKPSUB 

SORT ON ENTRY, NUMBER TO'SUBSORT 

USE SUBSORT 

COUNT TO S030EKE 

REPLACE ALL RFEND WITH 1 

MARK1 e I 

SW2«0 

DO WHILE SW2*0 ROLL 
IF MARKl SUEGENE 
PACK • 

COM TO BUNIQUE 

SW2-1 

LOOP 

ENDIF 
GO MARKl ■ 
DUP * 1 

STORE, ENTRY TO TESTA 
STORE D TO DSSIGA 
SW s 0 

DO WHILE SW«0 TEST 
SKIP 

STORE ENTRY TO. TESTB 
STORE D TO DESIGB 

IF TESTA = TESTE. AND. DESIGArOESIGB 
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DELETE 

DUP « DUF+1 

LOOP 

GO MARK1 

REPLACE RFEND WITH CUP 
MARK! = MARK1+DUP 
SW=rf 
LOOP 

ENDDO TEST 
LOOP ; 
ENDDO ROLL 

SORT CN RFSND/D, NUMBER TO TEMPSUBSORT 
•USE TEMP5UBSORT 

* REPLACE ALL START WITH MIKD/IDGENE*10CO0 
COUNT TO TEMPSUECO 

* FUSION ROUTINE 

? 'SUBTRACTING LIBRARIES' 

USE SUBTRACTION 

COPY STRUCTURE TO CRUNCHER 

SELECT T 

USE TEWFSUB SORT 

SELECT 1' 

USE CRUNCHER 

APPEND FROM TEMPTARSORT 

COUNT 10 BAILOUT 

MARK a 0 

* . 

DO TOILS .T.. 
SELECT 1 
MARK = MARK+1 

IF MARK>BAILOUT 

EXIT 

GO MARK 

$TORE_ENTRY to SCANNER 
SBXJSCT 2 

LOCATE. FOR ENTRY-SCANNER 
UP FOUND () 
STORE RFEND TO BIT1 
STORE RFEND TO BIT2 

STORE 1/2 TO BITl 
STORE 0 TO BIT2 
ENDIF 
SELECT X 

REPLACE BGFRBQ WITH BIT2 
REPLACE ACTUAL WITH BITl 
LOOP 
ENDDO 

SELECT 1 

REPLACE ALL RATIO WITH SPEND/ACTUAL 

? 'DOING PINAL SORT BY RATIO* 

SORT ON RAT 10 /D, EGFREQ/D, DESCRIPTOR TO FINAL 

USE FINAL 

eeb talk off 

CO CASE. 

CASE PTF-O' 

SET DEVICE TO PRINT 

SET PRTOT ON 

aracT / 

CASE PTF-1 

£R£ ALTERNATE TO "Adenoid .Patent Figures: Subtraction, txt" 
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SET ALTERNATE ON 
ENDCASE 

STORE VAL(SYS<2) ) TO FINTIME 

IF iOOTIKE<STARTIMH 

STORE FUTTIME+86400 TO -FUSTIME 

KNDIF 

STORK FBTTIKE - STARTIME TO CXKPSEC 
STORE CCMPSEC/60 TO COMFMIN 

'*******♦******** *'**+* 

Slfl I ^^lIbr^y Subtraction Analysis" STYLE 65536 FONT 'Geneva', 274 COLOR 0,0,0,-1,-1, 

7 
•a 

? 
? 

? datet) 

r? * 

77 TB<£2() 

7 'Clone numbers * 

77' 6TR (INITIATE, 5,0) 

,?? ' through ' ' 

?? STR(TERMIKATS, 6,0} 

7 'Libraries t * 

? Target 1 

IP Target2<>' 

??. \ • * 

?? Targets 

ENDS? 

IP Target3<> ' 

7? Target3 
EKDI? 

? *Su3?tr acting: 
7 Objectl 
rF-Object2<>' 
??* , . ' 
7? Object 2 
ENDIF 

77 0bject3 
EMDI? . 

■7 'Designations j* .' 

IP Eraatch=0 .AND. Kmatch-0 , AND. Croatch^Q .AND. IKATCH^O 

?? 'All* 

©3DIF .. 

IF Bnatchal 

?? 'exact,* 

ENDIF 

IF Hroatchsl 
?? 'Human, 1 
ENDIF 

•IF onateh-l 
?? 'Other sp. ' 

ENDIF 

IF Imatch«l 
?.? ' INCTfTS ' 
EKDIF 
•IF AMAL=1 

? 'Sorted fey ABUNDANCE'- 

ENDIF . 
IF ANAL»2 

? 1 Arranged by FUNCTION* 
ENDIF 



WO 95/20681 



FCT/US95/01160 



? 'Total clones repr e'sented : • 
?? STR COOT, 5,0) 
? ' Total clones analyzed s ' 
?? 6TR(5TARTOT, 5, 0) 
7 'Total, computation, time* 
. ?r$TR{COMPMIN,5,2} 

77 1 minutes' ' 
? 

?' 'd » designation f - distribution * ~ location r = function s = species i = inte 
?* 

SCREEN 1 TYPE 0 HEADING "Screen 1* AT 40,2 "SIZE 286,432 PIXELS FONT h Geneva*,9 COLOR 0,0,0, 

DO CASE . 

CASE ANAI>=1 

?? STR<AUNIQUE,4,0} 

'77 1 genes, for a total of 1 
.77 STR<ANALT0T,4,0) 
77 ' clones' 

? • 

SCREEN 1 WPS 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT •Geneva 1 , 7 COLOR 0,0,0, 
list OFF fields number,D, F, Z,RjE*7TRY, S, DESCRIPTOR, EGFREQ, RFEND,RATIO, I 
SET PRINT 'OFF' 
CLOSE DATABASES , 

■USE. "Sn , artGuy;Fc«BASE+/Mae:fax files : clones, dbf 1 

CASE. ANAL*2 
■ * arrange/ function 
SET PRINT' ON 

SET HEADING ON 

SCREEN 1 TYPE, 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica* , 266 COLOR 0 

7 * 
• ? t BINDING FRCT3XNS' 

*» 

SCREEN* 1 TYPE 0 HEADING 'Screen 1"'AT 40,2 B1Z& 266,492 PIXELS FONT -Helvetica' , 265 COLOR 0 
? 'Surface molecules and receptors r' 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 266,492 PIXELS FONT "Geneva*, 7 COLOR 0,0,0, 
list OFF fields number, D/F, Z r R f F^n7lY, $,IXSCRIPTOR/BCFR20, RFEND, RATIO, I FOR R*'B' 

. t 

' SCREEN 1 TYPE 0 HEADING •Screen V ' AT 40,2 SIZE 286,4.92 P2EL2 .FONT .'Helvetica" , 265 COLOR 0 
7 * Calcium-binding proteins:* 

SCREEN 1 TYPE 0 HEADING 'Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT ■Geneva* ,7 COtCR 0,0,0, 
list OFF fields number, D, F, Z,R, ENTRY, S, DESCRIPTOR, HGFREQ, RFEND, RATIO, I FOR FUr'C* 

SCREEN* 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica" , 265 COLOR 0 
7 'Liganda 'and effectors; I 

SCREEN 1 TYPE 0 HEADING 'Screen 1* AT 40,2 SIZE 286,432 PIXELS FONT •Geneva - , 7 COLOR 0,0,0, 
list OFF fields nurntoeriD^ F,Z,R, ENTRY, S, DESCRIPTOR, BGFRJSQ, RFEND, RATIO, I FOR R*'S' 

SCREEN 1 TYPE 0 irEADUCG "Screen 1* AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica" , 265 COLOR 0 
?' 'Other binding proteins : • 

SCREEN 1 TYPE -0 HEWING "Screen 1* AT'40,2 SIZE 286,492 PIXELS FC3MT "Geneva', 7 COLOR 0,0,0, 

list OFF fields' number, D r F,Z,R, ENTRY, S, DESCRIPTOR f EGFREQ, RFEND, RATIO, I FOR Rs'I' ' 

7 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica" ,268 COLOR 0 
? < . ONCOGENES' 

7 ' 

SCREEN 1 TYPE 6 HEADING •Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT ' * Helvetica" ,265 COLOR 0 
? 'General oncogenes! 1 , 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT .40, 2 SIZE 286,492 PIXELS . FONT "Geneva", 7 COLOR 0,0,0, 
list OFF fields' number, D> F , Z» R, EOTRY, S , DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR Rss'O' 

SCREEN i TYPE 0 HEADUsPG "Screen V AT 40,2 SIZE 286,492 PIXELS FONT •Helvetica" , 265 COLOR 0 
7 'GTP-binding proteins) 1 ' 

SCREEN 1 TOTE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT 'Geneva ',7 COLOR 0,0,0, 
list OFF fields number, D, F,Z,R, ENTRY, S, DESCRIPTOR, BGFREQ, rfend, RATIO, I FOR R='G' 
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SCREEN 1 TYPE 0 HEADING 'Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT • Helvetica «, 265 COLOR 0 
? 'Viral elements! 1 v>^rt^ 7 nf 

SCRESN i TYPE 0 HEADING "Screen 1" AT. 40,2 SIZE 236,492 PIXELS FONT COCOR *0r0r0, 

list OFF fields number , D,_F, Z, R, ENTRY / S, DESCRIPTOR, EGFREC, RFE^D, RATIO, I FOR fU*V* 

SCREEN 1 TYPE 0 HEADING "Screen 1> AT 40,2 SIZE 286,432 PIXELS FOOT 'Helvetica*, 265 COLOR 0 
?■ 'Kinases and Phosphatases t ' • 

SCREEN 1 'TYPE 0 HEADING "Screen 1* AT 40,2 SIZE 286,432 PIXSLS FONT •Geneva',? COLOR 0,0,0, 
list OFF fields number, D, F, Z, REENTRY, S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R^'Y'' 

SCREEN 1-TOPE 0 HEADING 'Screen 1* AT 40,2 SIZE 286,492 PIXELS FCNT 'Helvetica'" , 2 65 COLOR 0 
7 'Tumor-related aritigenst ' 

SCREEN 1 TYPE 0 HEADING .'Screen 1* AT 40,2 SIZE 286,492 PIXSLS FONT "Geneva", 7 COLOR 0,0,0, 

Use OFF 'fields number, D, F, Z, R, ENTRY, S, DESCRIPTOR, SGFREQ, RFEND, RATIO, I FOR R-'A' 

?. 

SCREEN 1 TYPE 0 HEADING 'Screen 1" AT 40,2 SIZE 2S6,492 PIXELS FONT 'Helvetica" , 268 COLOR 0 
? ' ' PROTEIN SYNTHETIC MACHINERY' PROTEINS \ . 

7 • - 

SCREEN 1 TYPE 0 HEADXW3 "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica" , 265 COLOR 0 
? 'Transcription and Nucleic Acid-binding proteins t' 

SCREEN 1 TYPE 0 HEADING "Screen 1* AT 40,2 SIZE 286,492 PIXELS FONT "Geneva", 7 COLOR 0,0,0, 
list OFF fields number, D, F, Z, R,EOTRY, S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R='0' 

SCREEN 1 TYPE 0 HEADING 'Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica", 265 COLOR, 0 
^ ' Translation i ' •* * 

SCREEN 1 TYPE 6 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FCNT 'Geneva", 7 COLOR 0,0,0,' 
list 'OFF fields number, D,F,Z,R> ENTRY.; S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R*'T» 

SCREEN 1 TYPE 0 HEADING 'Screen 1" AT'40,2 SIZE 286,492 PIXELS FONT "Helvetica* ,265 COLOR 0 
7 'RUboscffnal proteins :' 

SCREEN 1 TY?E 0 HEADING "Screen 1" AT 40,2 SIZE 285,492 PIXELS FONT "Geneva", 7 COLOR 0,0^0, 
list OFF fields nuitber, D;F, Z, R, EOTRY, 3, DESCRIPTOR, BOFREQ, RFEND, RATIO, I FOR R»'R ' 

SCREEN 1 TYPE 0 HEADING .'Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT 'Helvetica' , "2 6 5 COLOR 0 
? Protein processing: ' 

SCREEN 1 TYPE 0 HEADING "Screen 1* AT 40,2 SIZE 286,492 PIXELS FONT "Geneva",? COLOR 0,0,0, 

list OFF fields number, D, F,Z,R,EtfTRY,S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R*'L' 

? 

SCREEN 1 TYPE 0 HEADXW3 "Screen 1' AT 40,.2 SIZE 286,492 PIXELS , FONT 'Helvetica* , 268 'COLOR 0 
"? 

? 1 ENZYMES' 
? 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica" ,265 COLOR 0 
7' ' Ferroj?? oteins t ' * 

SCREEN 1 TYPE 0 HEADING 'Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Geneva' ,7 COLOR 0,0,0, 
list OFF fields number, D,F, 2, R,2frJTRY,S, DESCRIPTCR, BGFREQ, RFEND, RATIO, I FOR R= ' F ' 

SCREEN 1 TYPE 0 HEADING 'Screen- 1 * 'AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica ",265 COLOR 0 
7 'Proteases and inhibitors:' . * 

SCREEN 1 TYPE 0 HEADING "Screen 1' AT 40,2 SIZE 266,492 PIXELS FONT "Geneva", 7 COLOR 0,0,0, 
list OFF fields number, D, P, Z, R, ENTRY, S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R=*P* 

SCREEN 1 TYP2 0 HEADING 'Scj:^^ 1' AT 40,2 SIZS 285,492 PIXELS FONT "Helvetica', 265 COLOR 0 
? 'Oxidative phosphorylation: .' 

SCREEN 1 TYPE 0 HEADING "Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT "Geneva", 7 COLOR 0,0,0, 
list OFF field* number, D,F,Z,R,E3mY,S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R-'Z* 

SCREEN 1 TYPE 0 HEADING' 'Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica" , 265 COLOR 0 
7 'Sugar metabolism: ' * 

SCREEN 1 TYPE n HEADING 'Screen 1* AT 40,2 SIZE 296,492 PIXELS FONT "Geneva", 7 COLOR 0,0,0, 
list OFF fields number, D f F,Z, REENTRY, S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR Rss'Q' 

* * t 

t 

SCREEN* 1 TYPE 0 HEADIKG "Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica" , 255 COLOR 0 
7 'Amino acid metabolism: 1 

SCREEN 1 TYPE 0 HEADING "Screen 1* AT 40,2 SIZE 286,492 PIXELS FONT "Geneva', 7 COLOR 0',0,0,' 
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list OFF fields number, D,F,Z»R,ENTR¥, S» DESCRIPTOR, BGFRBQ, RFEND, RATIO,-! FOR R« 'M* 

SCREEN 1 TYPE 0. HEADING "Screen 1" AT 40,2 SIZE 236,492 PIXELS FONT "fif^fict"^6 <?0§OR 0 

? 'Nucleic acid metabolism; 

SCREEN 1 TYPE 0 'HEADIN3 "Screen "1" AT 40,2 SIZE 266,492 PIXELS FONT "Geneva", 7 COLOR 0,0,0, 
list OFF fields number, D, F, Z, R,SNTR5f, S, DESCRIPTOR, BGFREQ, RFEND, RATIO, -I FOR R*»N' 

'SCREE2T '1 TYPE 0 HEADING ■Screen 1" AT 40,2 SIZE 286,492 PIXELS' FONT "Helvetica" , 2 65 COLOR 0 
? 'Lipid metabolism: ' ■ 

SCREEN 1 TYPE 0 HEADING "Screen 1* AT 40,2 SIZE 286,492 PIXELS FONT •Geneva", 7 COLOR 0,0,0, 
list OFF fields number, D,F, Z, R,ENTR¥,S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R«»W 

SCREEN 1 TYPE 0 HEADU*3 ■Screen 1" AT 40,2 SIZE 206,492 PIXELS FONT "Helvetica*/ 265 COLOR 0 
? f Other enzyme3 : ' 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,49:. PIXELS FOOT "Geneva", 7 COLOR 0,0,0, 
list OFF fields number, D',F, 2 ,R,ENlKST,a, DESCRIPTOR, BGFREQ, RFEND, RATIO,! FOR R='E' 

? > > ' • " 

SCREEN 1 TYPE 0 HEADING * Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica", 2 68 COLOR 0 

? ' 

? » MISCELLANEOUS CATEGORIES ' 

*> 

■ 

SCREEN 1 TYPE 0 HEADING • Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT 'Helvetica' , 265 COLOR 0 
? 'Stress response r • • ' 

SCREEN I TYPE 0 HEADING 'Screen 1" AT 40,2 SI2S 286,492 PIXELS FONT "Geneva", 7 COLOR 0,0,0, 
list OFF fields nwker,D # F;Z,R,EtTOY,S^ FOR Rs'H 1 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT •Helvetica" , 265 COLOR' 0 
? 'Structural: 1 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT 'Gensv* 1 ",? COLOR 0,0,0, 
list OFF fields numbe*,D,FiZ,R,S^Y,-e,GESra R-*K' 

p j i ■ 

* 4 
t 

SCREEN 1 TYPE 0 HEADUX3 'Screen 1* AT 40/2 SIZE 286,492 PIXELS FONT "Helvetica" , 265 COLOR -0 
? • 1 Other clones i ' 

SCREEN 1 TYPE 0 .HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS * FONT* "Geneva", 7. COLOR 0,0.0. 
list OFF fields number,D,F, Z,R,SmY r S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R*«X' 

# 

■ * * 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica" , 265 COLOR 0 
? ' Clones ■ of imknown function : ' ■ • 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Geneva", 7 COLOR 0,0,0, 

list OFF fields number, d,?,z,r,entoy,s / descriptor, EGFREQ, RFEND, RATIO, I FOR R«»tT 

ENDCASE 

DO "Teat print .prg" 

SETT PRINT OFF 

SET DEVICE TO SCREEN 

CLOSE DATABASES 

ERASE TEMFLIB * DBF 

ERASE 5EMPNUK.CBF 

ERASE TEMPDESIG.DBF 

SET MARGIN TO 0 

CLEAR 

LOOP 

ENDDO 
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♦Northern (single) , version 11-25-94 

close databases 

SET TALK OFF 

SET PRINT OFF ' 

SET EXACT OFF 

CLEAR * 

STORE •' ' TO Eohject 

STORE ' 1 TO Dobject 

STORE 0 TO Numb . 
STORE 0 'TO Zog 
STORS 1 TO Bail 
DO WHILE ,T, 

* Program, t Northern (single) .fcnt 

* Data....: 8/ 8/94 \ 

* Version.; . FoxBASE* /Mac , ' revision 1.10 

* Notes : 'Format file Northern (single) 



SCREEN 1 TYPE 0 HEADING "Screen 1» AT 40,2 SIZE 286,492 PIXELS FONT "Geneva M2 COLOR '0,0, 0 
3 PIXELS 15,31 TO 46,397 STYLE 58447 COLOR 0,0,-1,-25600,-1,-1 ' • ' ' 

9 PIXELS 89,79 TO 192,422 5TYLB 28447 COLOR 0,0,0,-25600,-1,-1 

© PIXELS 113,98 SAY "Entry #i' STYLE. 6553 6 FONT "Geneva M2 COLOR 0,0,0,-1,-1,-1 

5 PIXELS 115,173 GET Eobject STYLE 0 FONT B .-^ev*M2 SI2S 15,142 COLOR 0,0,0,-1,-1.-1 

6 PIXELS 145,89 SAY '^ascription" STYLE 65536 FONT "Geneva M2 COLOR 0,0,0,-1 -1 -1 

« PIXELS 145,173 G2T Oobject: STYLB 0 FOm- "Geneva M2 SI2£ 15,241 COLOR 0,0,0,-1,-1,-1 

G PIXELS 35,89 SAY "Single. Northern search screen' STYLE 55536 FONT "Geneva ■> 274 COLOR 0.0 - 

S PIXELS 220,162 GET Bail STYLE 65536 FONT "Chicago', 12 PICTURE "S*R Continue;3ail out* SIZE 

$ PIXELS 175,98 SAY 'Clone #:» STYLE 65536 FONT -Genava',12 COLOR 0,0,0,-1,-1,-1 

ft' PIXELS 175,173 GET Numb STYLE 0 FONT "Geneva M2 SIZE 15,70 COLOR 0,0,0,-1,-1,-1 

e PIXELS 80,152 SAY -filter any ONE of the following i" STYLE 6553S FONT •Geneva 1 , 12 COLOR -1* 

♦'EOF: Northern (single) »fmt 
READ 

IP Bail=2 
CLEAR • 
screen 1 off 
'RETURN 
EMDTF 

USB "SiiuirtGuyiFoxBASE+/Mac;Fox files (Lookup »dbf* 
SET TALK' ON 

IF Bobj ecto ' 

STORE UPPER (Eohject) to £ object 
SEX SAFETY OFF 

SORT ON Entry TO "Lookup entry. dbf * 

SET SAFETY ON 

U5S "Lookup entry. dbf* 

LOCATE FOR Lookr=Eobject 

IF ..NOT.FOUND() ' 

CLEAR 

LOOP 

2ND!? 

BROWSE 

STORE Entry TO Searchval ■ 

CLOSE DATABASES 

ERASS '"Lookup entry. dbf " 

ENDI? 

•Xf • Dob j ecto' • 
SET EXACT OFF 
SET SAFETY OFF 

SORT' ON descriptor TO "Lookup descriptor, dbf 
SET SAFETY On 

USE "Lookup descriptor. dbf " 

LOCATE FOR UPPER (TRm (descriptor) )sUPPER(TRIM(Dbbjeet) ) ■ 

XF 4 NOT . FOUND ( ) 

CLEAR 



5 6 



WO 95/20681 



PCT7US95/01160 



LOOP 
BROWSE 

STORE Entry TO Searchval 

CLOSE DATABASES 

ERASE "Lookup descriptor. dbf " 

SET EXACT CN 

IP NucrnboO 

USE i SrittrtGuyiF©xBASB+/Mac:Fox files ; clones. dbf > 

GO NUmb 
BROWSE 

■STORE Entry TO Searchval 
END IF 

CLEAR 

? « Nor them analysis for entry • 
?? Searchval 

7 . * 

? 'Eacer Y to proceed 1 

WAIT TO OK • 

CLEAR 

IF UPPER (OX) o»Y< 
screen 1 off 
RETURN 
ENDIF 

* COMPRESSION ' SUBROUTINE FOR Library, db£ 

? 'Conpreasing the Libraries file now.-.. 1 , 

USE ■ SmartGuy : FoxBASE* /Mac : Fox files: libraries -dbf 

SET SAFETY OFF > 

SORT ON library TO 'Compressed libraries . dbf " 

* FOR ente red>0 * 
SET SAFETY ON 

USE "Conpreesed libraries-, dbf ■ 

DELETE FOR enter eoWo 

PACK 

COUNT TO TOT 
MARK1 m 1 

DO WHILE SW2=Q ROLL 
'IF MAR^l >° TOT 
• PACK 
SW2el 
LOOP 
ENDIF 
GO MARK1 . 
' STORE library TO TESTA 
'SKIP 

STORE Library TO TESTS 
IF TESTA = TESTS 
DELETE 
ENDIF 

MAKK1 * MARK1+1 
LOOP 

SflDDO ROLL 

* Northern analysis 

CLEAR 

? 'Doing the northern new. . 
SET TALK ON 

USE B SmartC5«yiFox3ASE+/Mac:Fc« files (clones, dbf* 
SET SAFETY OFF 

COPY TO "Hits. dbf, FOR en try= searchval 
SET SAFETY ON 
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* MASTER ANALYSIS 3; VERSION 12-9-94 

* Master menu for analysis output 

DATABASES 
SET TALK OFF 
SET SAFETY OFF 
CLEAR 

SET DEVICE TO SCREEN 

SET DEFAULT TO " Smart Guy : F6X&ASE+- /Mac : fox files :Output programs:" 
USE "SmartGuyTFoxBASS-f-ZMac: fox files : Clones ♦ dbf" 
GO TOP 

STORE NUMBER TO INITIATE 
GO BOTTOM 

STORE NUMBER TO TERMINATE 
STORE 0 TO ENTIRE 
STORE 0 TO CQNDEN 
STORE 0 TO ANAL 
STORE 0 TO EMATCH 
STORE 0 TO HMATCH 
STORE -0 TO OMATCK 
STORE 0 TO IMATCH 
STORE 0 TO XMATCK 
STORE 0 TO PRINTCN 
STORE 0 TO PTF 
DO WHILE .T. 

* Program.: Master analysis, fint 

* Date : 12/ 9/94 

* Version.: FoxBASE*/Mac, revision 1.10 

* Notes. . . . .* Format file Master analysis 
* 

SCREEN 1 TYPE 0 HEADING -Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT -Geneva's 9 COLOR 0,0,0, 
a PIXELS 39,255 TO 277,430 STYLE 28447 COLOR 0,0,-1,-25600,-1,-1 
6 PIXELS 75,120 TO 178,241 STYLE 3871 COLOR 0,0,-1,-25600,-1,-1 

Q PIXELS 27,98 SAY "Customized Output Menu" STYLE 65536 FONT " Gene va ■ , 2 7 4 COLOR 0,0,-1,-1,-1 

6 PIXELS 45/54 GET conden STYLE 65536 FONT "Chicago", 12 PICTURE »@*C Condensed format- SIZE 
3 PIXELS 54/261 GET anal STYLE 65536 FONT 'Chicago", 12 PICTURE *Q*RV Sort /number; Sort /entry j 
® PIXELS 117,126 GET EMATCH STYLE 65536 FOOT "Chicago *,12 PICTURE "fi*C Exact " SIZE 15,62 CO 
Q PIXELS 135,126 GET HMATCH STYLE 65536 FONT "Chicago" , 12 PICTURE "®*C Homologous" SIZE 15,1 
Q'PIXELS 153,126 GET OMATCH STYLE 65536 FONT "Chicago* , 12 FICTURE *&*C Other Spc u SIZE 15,84 
@ PIXELS 90,152 SAY "Matches:" STYLE 65536 FONT "Geneva" ,268 COLOR 0,0,-1,-1,-1,-1 

Q PIXELS 63,54 GET PRINTON STYLE 65536 FONT 'Chicago" ,12 PICTURE "@*C Include clone listing" 
@ PIXELS 171,126 GET Imatch STYLE 65536 FONT "Chicago",! 2 PICTURE "£*C Incyte" SIZE 15,65 CO 
@ PIXELS 252,146 GET initiate STYLE 0 FONT "Geneva", 12 SIZE ' 15, 70 COLOR 0,0,-1,-1,-1,-1 
@ PIXELS 270,146 GET terminate STYLE 0 FONT * Geneva «, 12 SIZE 15,70 COLOR 0,0,-1,-1,-1,-1 
8 PIXELS 234,134 SAY "Include clones 1 STYLE 65536 FONT "Geneva", 12 COLOR 0,0,-1,-1,-1,-1 
S PIXELS 270,125-SAY "->" STYLE 65536 FONT "Geneva-, 14 COLOR 0,0,-1,-1,-1,-1 
@ PIXELS 198,126 GET PTF STYLE 65536 FONT "Chicago M 2 PICTURE "@*q Print to file" SIZE 15,9 
€ PIXELS 189,0 TO 257,120 STYLE 3871 COLOR 0,0,-1,-25600,-1,-1 

<3 PIXELS 209,3 SAY "Library selection" STYLE 65536 FOOT "Geneva" , 266 COLOR 0,0,-1,-1,-1,-1 
£ PIXELS 227,18 GET ENTIRE STYLE 65536 FONT "Chicago 11 , 12 PICTURE "$*RV All; Selected* SIZE 16 
w 

* EOF: Master analysis. £mt 
READ 

IF ANAL=9 
CLEAR 

CLOSE DATABASES 
ERASE TEMPMASTER . DBF 

USE "SmartGuy iFoxHASE+/Mac: fox files : clones .dhf n 

SET SAFETY ON 

SCREEN 1 OFF 

RETURN 

END IF 

clear 

? INITIATE 

? TERMINATE 

7 CONDEN 
? ANAL 
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7 ematch 
? Hmatch 
? Qrratch 

? IMATCH 
SET TALK ON 

I? ENTIRE =2 
USE "Unique libraries .'dbf- 

REPLACE ALL i WITH ' ' 

BROWSE FIELDS i , libname, library , total , entered AT 0,0 
ENDIF 

USE "SmartGuy:FoxBASE+/Mac:fox files i clones. dbf" 

+COPY TO T2MFNUM FOR NUM3ER>= INITIATE . AND . NUMBER <= TERMINATE 

*US3 TEMPWUM 

COPY STRUCTURE TO TEMPLIB 
USE TEMPLIB 
I? ENTIRE" 1 

APPEND FROM 'SsiartGuy : FoxBASE* /Mac ; f ox files i Clones . dbf ' 
ENDIF 

IF EOTIREi2 
USE "Unique libraries . dbf * 

COPY TO SELECTED FOR UPPER ( i) = 1 Y ' 
USE SELECTED 

STORS R3CC0UOT0 TO STOP IT 
MARK*1 

DO WHILE .T. 

IF MARK>STOPIT 

CLEAR 

EXIT 

EMDIF 

USE SELECTED 
GO MARK 

STORE library TO THISONE 
? 1 COPYING 1 
?? THISONE 
USS TEMPLIB 

APPEND FROM J Smart Guy ; FoxBASE* /Mac : f ox files: Clones, dbf" FOR library^THISQTCE 
STORS MASX+1 TO MARK 
LOO? 
ENDDO 
ENDIF 

USE M SmartGuy:?oxBASE+/I-£ac:fox ^'.les : clones, dbr" 

COUNT TO STARTGT 

COPY STRUCTURE TO TEMP DBS IG 

USE TEMFDSSIG 

IF Etetch^O .AND.. HmatCh-0 .AND. Oratch=0 .AND. IMATCH»0 

APPEND FROM TEMPLIB 

ENDIF 

IF Ematch=I 

APPEND FROM TEMPLIB FOR Da'E' 
EMDIF 

IF Htnatch=l 

APPEND FROM TEMPLI3 FOR D='H' 
ENDIF 

IF Omatchsl 

APPEN*D FROM TEMPLIB FOR D='0' 
ENDIF 

IF Imatchsl 

APPEND FROM TEMPLIB FOR D= 1 1 ' . OR , D« ( X ' . OR . D» ' N 1 
IF Xrratchnl 

APPEND FROM TEMPLIB FOR D= , X' 

ei^dif 

CCUOT TO ANALTOT 

set talk off 
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CASE PTP-0 

SET DEVICE TO PRINT 

SET PRINT ON 

EJECT 

CASE 

SET ALTERNATE TO "Tocal function eort-txt" 

"SET ALTERNATE TO 11 H and 0 function sort^txt" 

*SET ALTERNATE TO "Shear Stress HUVEC 2;Abur.dar.ce sort.txt* 

*SET ALTERNATE TO "Shear Stress HUVEC 2 : Abundance con.txt" 

*SET ALTERNATE TO "Shear Stress HUVEC. 2: Function sort,txt rt 

*SE? ALTERNATE TO "Shear Stress HUVEC 2 : Distribution scrt.zx'z" 

*$ET ALTERNATE TO "Shear stress HUVEC a ; Clone Ust.txt" 

*SET ALTERNATE TO "Shear Stress HUVEC 2: Location eort.txt' 1 

SET ALTERNATE ON 

ENDCASE 

IP PRIOT0N=L 

£1,30 SASf "Database Subset Analysis" STYLE 65536 FOOT "Ganeva" , 274 COLOR 0, 0, 0, -1, -1, -1 

ENDIF 

? 

? 

* 

•5 

t 

? dateO 

?? ' 1 
?? TIMSO 

? 1 Clone- numbers ' 

77 STR( INITIATE ,6,0) 

77 1 through ' 

77 STR {TERMINATE, 6,0) 

? ' Libraries : ' 

IP ENTIRE* 1 

? 'All libraries' 

ENDIP 

IP ENTIRE- 2 

MARKel 

DO WHILE ,T, 
IF MARK>STOPIT 
EXIT 
ENDIF 

USE SELECTED 
GO MARK 

77 TRIM(lihname) 
STORE MARK+1 TO MARK 
LOOP 
END DO 
SNDIF 

? 'Designations; * 

IP Bmatch=0 .AND. Hmatch=0 .AND. Gmatch=Q ,AKD. IMATCH=0 

?? 'All' 

ENDIF 

IF Ematch=l 
?? 'Exact, ' 
ENDIF 

IF Hroatch=l 

?? 'Human, ' 

ENDIF ' 

IF Cmafcch»l 

77 'Other .sp. ' 

ENDIF 

IF Imatch=l 
?? 'IMCVTE' 
ENDIF 

IF Xrratch=l 
11 'EST 1 
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ENDIF 

I? CONDEN=l 

? 'Condensed format: analysis 1 

ENDIF 

IF ANAL^l 

? 'Sorted by NUMBER' 

5NDIF 

IF ANAL=2 

? 'Sorted by ENTRY 1 

ENDIP 

IF ANAL=3 

? 'Arranged by ABUNDANCE 1 

ENDIF 

IF ANAL=4 

? 'Sorted by INTEREST ' 

ENDIF 

I? ANAL=5 

? 'Arranged by LOCATION 1 
ENDIF ' 
IF ANAL=6 

? 'Arranged by DISTRIBUTION' 
IF ANAL=7 

? 'Arranged by FUNCTION ' 
ENDIF 

? "Total clones represented: ' 

?? STR(STARTOT,$,0) 

? 'Total clones analyzed! * 

?? STR<A.NALTOT,6,0> 
? 

7 '1 = library d « designation f = distribution z = location r - function c = cer 
? 

USE TEMPDE3IG 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40/2 SIZE 286,492 P2XEL-S FONT li Geneva H , 7 COLOK 0,0,0, 
DO CASE 
CASE ANMjel 

* sort /number 
SET HEADING ON 
IF CONDEN=l 

SORT TO TEMPI ON ENTRY, NUMBER 
DO "COMPRESSION number .PRG* 

SORT TO TEMPI ON NUMB3R 
USE TEMPI 

list off fields nuntoeriL,D,F,Z,RiC,3TOR*,S, DESCRIPTOR 
*list Off fields number, L,D,F, 2, R,C,SrTRY,S,DSSC^^^ 
CLOSE DATABASES 
ERASE TEMPI * D3F 
ENDIF 

CASE ANAL =2 

* sort/DESCRIPTOR 
SET HEADING ON 

*SORT TO TEMPI ON DESCRIPTOR, ENTRY ,NUM3ER/S for D= 'E ' .OR.D= 'K 1 ,OR.D= ' 0' , OR, D^'X* *OR.D= ' 1 1 
*SORT TO TEMPI ON ENTRY, DESCRIPTOR , NUMBER / S for D= 'E' .OR.D»' H* .0R.D» 'O' ,OR.D='X l .OR.D* ' I ' 
SORT TO TEMPI ON ENTRY, START /S for D= ' E 1 .OR.D« 'K* .OR.D= 'O' ,0R.D= 1 X' . OR.D* 1 1* 
IF CQNDEN=1 

DO "COMPRESSION entry . PRG * 
USE TEMPI 

list off fields number, L,D,F,Z,R,C,ENTRY,S, DESCRIPTOR, LENC3TH, RFEND, INIT, I 
CLOSE DATABASES 
ERASE TEMPI. DBF 
ENDIF 
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CASE ANAL =3 

* sort by abundance 

SET HEADING ON 

SORT TO TEMPI ON ENTRY , NUMBER for D= ' E' .OR. D- ' H* .OR. D= 'O * .OR.Dx ' X ' ,OR,D* ' I ' 

DO "Compression abundance, peg* 



CASE ANAL* 4 
* sort/interest 
SET HEADING ON 
IF CONDEN^l 

SORT TO TEMPI ON ENTRY, NUMBER FOR I>0 

DO "COMPRESSION interest , rRG M 

ELSE 

SORT ON I/D, ENTRY TO TEMPI FOR I>1 
USE TEMPI 

last ait fields number, L,D,F, 2, R r C, ENTRY, S, DESCRIPTOR, LENGTH, RFEND,INIT, I 
CLOSE DATABASES 
ERASE TEtfPl.DBF 
ENDIF 



CASE ANAL^S 
* arrange /location 
SET HEADING ON 
STORE 4 TO AMPLIFIER 
? 'Nuclear. * 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L,D,F, 2, R,C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, COMMEM 
IF CCNDEN-1 

DO "Compression location. prg" 
ELSE 

DO "Normal subroutine 1" 
EKDIF 

? 1 Cytoplasmic i ' 

SORT CN £OTRY,NUM3ER FIELDS RFEND, NUMBER, L, D, F, 2 , R,C, ENTRY, S , DESCRIPTOR, LENGTH, INIT, I , CCMMEN 
IF CCNDEN=1 

DO "Compression location. prg* 
ELSE 

DO "Normal subroutine 1" 
EKDIF 

? ■ Cyt ©skeleton: 1 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUM3ER, L, D, F , 2 , R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, COMMEN 
IF CGNDE3tf=l 

DO "Compression location. prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

' i Cell surface ' ' 

SORT ON ENTRY, NUMBER FIELDS RF END , NUMBER, L,D,F, 2, R,C, ENTRY, <„ DESCRIPTOR, LENGTH, INIT, I, CCMMEN 
IF CONDEN=l 

DO "Compression location. prg" 
ELSE 

DO "Normal subroutine 1 H 
ENDIF 

? 'Intracellular membrane; 1 

SORT ON ENTRY y NUMBER FIELDS RFEND, NUMBER, L,D,F, 2, R,C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I , COMMEN 
IF CCNDEN=1 

DO 'Compression location, prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 'Mitochondrial:' ^ 
SORT ON ENTRY /NUMBER FIELDS RFEND, NUMBER, L, D, F, 2, R, d ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, CQMKSN 

IF CGNDENnl 

DO "Compression location. prg* 
ELSE, 

DO "Normal subroutine 1" 
ENDIF 
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? * Secreted: ' 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L,0,F, Z, R, C, ENTRY , S , DESCRIPTOR , IJSNGTK , XNXT # X , CCMMEN 
IF C0ND3N=1 

DO "Compression location. prg* 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 'Other i 1 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, 2, R, C, ENTRY, $, DESCRIPTOR, LENGTH, -XNXT, I, COMMEN 
IF CQNDEN^I 

DO "Conprassion loc&tion.prg* 
ELSE 

DO -"Normal subroutine 1* 
ENDIF 

? 'UnJcnovm; 1 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, 2, R, C, ENTRY , S, DESCRIPTOR, LENGTH, XNIT, X,00MMSN 
IF CCNDEN=1 

DO "Compression location ,prg" 
ELSE 

DO "Normal subroutine 1 1 
ENDIF 

IF CONDEN=l 

SET DEV TCS. TO PRINTER 

SET PRINTER ON 

EJECT 

DO * Output heading *prg" 
USE * Ana-lysis location.dbf ■ 
DO "Create bargr&ph . prg 1 
SET -HEADING OFF 

? • FUNCTIONAL CLASS TOTAL UNIQUE NEW % TOTAL* 

LIST OFF FIELDS Z , NAME , CLONES , GENES , NEW , FERCEOT, GRAPH 
CLOSE DATABASES 
ERASE TEKP2 . DBF 
SET HEADING ON 

*USE *SmartGuy:FoxBAS3+/Mac: fox files iTEMFMASTER. dbf ■ 
ENDIF 

CASE ANAL=6 

* arrange/distribution 

SET HEADING ON 

STORE 3 TO AMPLIFIER 

? 'Cell/tissue specific distribution:' 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, Z, R, C, ENTRY, 3, DESCRIPTOR, LENGTH, INIT, 2 ,COMKEN 
IF CONDENsl 

DO " Compression disnrib.prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 'Non-specific distribution; * 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L,D, F, 2, R, C, ENTRY, S, DESCRIPTOR, LENGTH, XNIT, I , COWMEN 
IF CQNDENsl 

DO "Compression distrib.prg* 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 'Unknown distribution: • * 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D,F, 2, R,C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I , COMMEN 

IF CONDEN=l 

DO "Ccmsression distrib.prg" 
ELSE 

DO "Ncrcral subroutine l" 
ENDIF 

TP CONDEN=l 

SET DEVICE TO PRINTER 

SET PRINTER ON 
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EJECT 

DO "Output heading. prg 1 

USE "Analysis distribution, dbf" 

DO 'Create bargra.ph.prg" 

SET HSAPIN3 OFF 

? 1 FUNCTIONAL CLASS TOTAL UNIQUE % TOTAL ' 

? ' 

LIST OFF FIELDS P , NAME , CLONES , GENES , PERCENT, GRAPH 
CLOSE DATABASES 
ERASE TEMP 2 . DBF 
SET HEADING ON 

*USE n Smart0uy:PoxBA3E+/Mac:fox files :TEMPMASTSR. dbf " 
ENDIF 

CASE ANAL*? 

* arrange/ function 

SET HEADING ON 

STORE 10 TO AMPLIFIER 

? 1 BIND 1 **" 1 PROTEINS' 

? 

? 'Surface molecules and receptors;' 

SORT ON ENTRY, NUM3ER FIELDS RFEND, NUMBER, L, D, F, Z, R, C, ENTRY, $, DESCRIPTOR, LENGTH, INIT, I, COMM5N 
IF CONDEN-1 

DO 4 Compression function. prg" 
ELSE 

DO "Normal subroutine 1* 
ENDIF 

? ' Calcium- binding pxoteins : ' 

SORT ON ENTRY, NUMBER FIELDS RFEND , NUMBER, L, D, T t Z, R, C, ENTRY, S, DESCRIPTOR, I^GTH, INIT; I, COMMEN 
IF CONDEN=l 

DO 'Compression function. prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 'Ligar.ds and effectors i' 

SORT ON ENTRY, NUMBER FIELDS RFEND,NUMBER, L, D, ?, Z, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, CCEMMSN 
IF CCNDEN**! 

DO •Congress ion function. prg" 
ELSE 

DO J Normal subroutine 1" 
ENDIF 

? 'Other binding proteins:' 

SORT ON ENTRY , NUMBER FIELDS RFEND, NUMBER, L, D, P, Z , R, C, ENTRY, S,DESCRIPTOR ( L5IsfGrai INIT, I, COMMEN 
IF CONDEN=l 

DO ■Compression function , prg" 
ELSE 

DO 'Normal subroutine 1" 

ENDIP 

"EJECT 

? 1 ONCOGENES ' 

? 

? 'General oncogenes!' 

SORT ON ENTRY, NUKBER FIELDS RFEND, NUMBER, Li, D, F, £, R, C, EOTRY, 5, DESCRIPTOR, LEfcJSTH, INIT, I,COMMEN 
IF CONDEN=l 

DO * Compress ion timet ion. prg* 
ELSE 

DO •Normal subroutine 1" 
ENDIF 

? 1 GTP-binding proteins i 1 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, CQMMEN 
IF CONDEN-1 

DO^Compression function. prg* 

DO 'Normal subroutine 1" 
ENDIF 

? 'Viral elements! 1 
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SORT ON ENTRY, NUMBER FIELDS RFEND,NU>3ER,L,D,:F, Z, R,C, ENTRY, S, DESCRIPTOR, LENGTH, 3NIT, I, COWMEN 
IF CONDE>J=l 

DO "Camoression function. prg* 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 'Kinases and Phosphatases:' 

SORT ON ENTRY, NUM3ER FIELDS RFEND, NUMBER, L, D, r, Z, R, C, ENTRY, 3 , DESCRIPTOR, LENGTH, INIT, I, CCMMEN 
IF CONDEN=i 

DO "Compression function .prg* 

DO "Normal subroutine i a 
ENDIF 

? ' Tumor- re la ted antigens t 1 

SORT ON ENTRY, NUMBER FIELDS HFEND, NUMBER, L, D,F, 2, R, C, ENTRY, S , DESCRIPTOR, LENGTH, mT, I, COWMEN 
IF CONDEN=l 

DO "Compression function. prg* 

DO "Normal subroutine 1' 

ENDIF 

♦EJECT 

? • PROTEIN SYNTHETIC MACHINERY PROTEINS ' 

? 

? 'Transcription and Nucleic Acid-binding proteins: 1 

SORT ON ENTRY , NUMBER FIELDS RFEND, NUMBER, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, COhWHN 
IF CONDEN=l 

DO "Compression function* prg' 
ELSE 

DO "Normal subroutine 1" 

ENDIF 

? 'Translation! 1 

SORT ON ENTRY , NUMBER FIELDS RFZKD, NUMBER, L, D, F, Z t R, C, ENTRY, S, DESCRIPTOR, LENGTH, INI T, I, OX4MEN 
IF CCNDEN=I 

DO "Compression function. prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 1 Riboscnal proteins: * 

SORT ON ENTRY, NUMBER FIELDS EFEND, NUMBER, L f D, F, 2, R, C, ENTRY, S , DESCRIPTOR, LENGTH, INIT , I , CCMMEN 
IF CONDEti^l 

DO "Compression function, prg" 

HI fP F 

DO "Normal subrouting 1" 
ENDIF 

? 'Protein processing: 1 

SORT ON ENTRY , NUMBER FIELDS RFEND , NUMBER, L, D, F, 2, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INXT, I#C0MMEN 
IF CONDEN-1 

DO "Compression function .prg". 
ELSE 

DO "Normal subroutine 2" 

ENDIF 

* EJECT 

? ' ENZYMES* 
*> 

t 

? ' Fetoproteins i ' 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D,F, Z, R, C, ENTRY, S , DESCRIPTOR, LENGTH, INIT,I,COMMSN 

IF CQNDEN-l 

DO "Compression function. prg" 

ELSE 

DO 'Normal subroutine 1" 
ENDIF 

? 'Proteases and inhibitors: 1 

SORT ON ENTRY , NUMBER FIELDS RFEND, NUMBER, L, D, F, Z, R,C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, CCMMEN 
IF CONDEN-1 

DO 'Compression function. prg" 

ELSE 



6 5 



WO 95/20681 



PCTYUS95/01160 



EC "Normal subroutine 1" 
ENDIF 

? 'Oxidative phosphorylation:' 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, 2 , R,C, ENTRY, S , DESCRIPTOR, LENGTH, ^IT, I, COWMEN 
IF CONDEN^l 

DO M Cornpre33ion function.prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? ' Sugar ■ met abolisnu 1 

SORT ON ENTRY, NUM3ER FIELDS RFEND, NUMBER, L,D, F, 2, R, C, ENTRY, 3, DESCRIPTOR, LENGTH, INIT/I,COMMEN 
IF CONDEN-1 

DO "Compression function, prg* 
ELSE 

DO "Normal subroutine 1* 
E^DIF 

? 'Amino a.cid metabolism: • 

SORT ON ENTRY, NU1-DER FIELDS RFEND, NUW3ER, L, D,F, 2, R, C, ENTRY, S, DESCRIPTOR, LENGTH, ItfIT, I, CCMMEN 
IP C0NDEN=1 

DO "Compression f unction. prg" 
ELSE 

DO "Normal subroutine ,1" 
ENDIF 

? 'Nucleic acid metabolism) * 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D,F, Z, R, C, ENTRY, S, DESCRIPTOR, IiENGTH, INIT, I,CHWMEN 
IF CQMDEN-1 

DO "Compression function. prg" 

ELSE 

DO -"Normal subroutine 1" 

endxf 

? 'Lipid metabolism: ' 

SORT ON ENTRY, NUMBER FIELDS RFEND, JNUMBER, L,D, F, Z,R,C, ENTRY, S, DESCRIPTOR, LENGTH, INIT , I , CCMMEN 
IF CONDEN«l 

DO "Compression function* prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 'Other enzymes t ' 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, 2, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I,COMNfEN 
IF CONTEND 

DO "Compression function. prg" 
ELSE 

DO 'Normal subroutine 1" 

ENDIF 

♦EJECT 

? 1 MISCELLANEOUS CATEGORIES' 

? 

? ' Stress ' response ; ' 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, 2 , R, C, ENTRY, S , DESCRIPTOR, LENGTH, IN IT,I,COMMEN 
IP CONDSNsl 

DO 'Compression function. prg" 
ELSE 

DO 'Normal subroutine 1" 
ENDIF 

? ' Structural ; ' 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, 2, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, COMMEN • 
IF CONDENssl 

DO 'Compression function, prg" 
ELSE 

DO 'Normal subroutine 1" 
ENDIF 

? 'Other clones : 1 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, 2, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, COMMEN 
IF CCNDEN=1 

DO •Compression function, prg" 
ELSE 
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DO *NormaJ. subroutine 1" 
ENDIF 

% ? 'Clones of urtoown function: • 

SORT ON ENTRY, NUMBER FIELDS RFEND,NUM3ER, L, D, ?* Z, R, C, ENTRY, S, DESCRIPTOR . LENGTH, INIT, I, COMMEN 
IF CONDEN=l 

P , DO 'Compression function .pre/ - 

§ ELSE 

^ do ■Normal subroutine 1" 

END1F 

IF CCNDEN-1 
EJECT 

*SET DEVICE TO PRINTER 

♦SET PRINT ON 

DO "Output heading .prg" 

*** 

USE 1 Analysis function. dbf 

DO "Create bar graph. prg* 

SET HEADING OFF 
*** 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286/492 PIXELS FONT "Geneva ",12 COLOR 0,0,0 
*+■* 

? ' TOTAL TOTAL NEW DIST 

? 1 FUNCTIONAL CLASS CLONES GENES GENES FUNCTIONAL CLASS 1 

'3$ *** 

4 *LIST OFF FIELDS P , NAME , CLONES , GENES , NEW , PERCENT , GRAPH , COMPANY 

LIST OFF FIELDS P , NAME , CLONES , GENES , NEW , PERCENT , GRAPH 
CLOSE DATABASES 
ERASE TEMP2 ♦ DBF 
SET HEADING ON 

*USE * Srrar tGuy : FoxBASE* /Mac i fox f Ues i TSMPMASTER . dbf • 
ENDIF 

CASE ANAL=8 

DO "Subgroup surtmry 3.prg" 
ENDCASE 

DO "Test print, pro; " 

SET PRINT OFF 

SET DEVICE TO SCREEN 

CLOSE DATABASES 

* ERASE TEMPLIB . DBF 

* ERASE TEMPNUM.DBF 

* ERASE TEMPDESIG ♦ DBF 

* ERASE SELECTED. DBF 
CLEAR 

LCOP 
ENDDO 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USE TEMPI 
COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARK1 = 1 

SW2-0 

DO WHILE SW2=0 ROLL 
IF MARK! >o TOT 
PACK 

COUNT TO UNIQUE 

COUNT TO NEWG3NES FOR 1 H 1 . OR . D= 1 0 * 

SW2=1 

LOO? 

H2STDIF 
GO MARK1 
DUP z 1 

STORE EOTRY TO TESTA 

SW e 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO TESTB 

IF TESTA - TESTS 

DELETE 

DUP = DUPrl 

LOOP 

ENDIF 
GO MARK1 . 

REPLACE RFEND WITH DUP 
MARKI * KARKl+DCJP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
•GO TOP 

STORE Z TO LOC 

USE « Analysis location. dbf 

LOCATE FOR Z^LOC 

REPLACE CLOWES WITH TOT 

REPLACE GENES WITH UNIQUE 

REPLACE NEW WITH NEWGENES 

USE TEMPI 

SORT on RFEND'/D TO TEMP2 

USE TEI4P2 

?? STR< UNIQUE, 5,0) 

?? ' genes, for a total of ' 

?? 3TR(TOT ( 5,0) 

?? ' .clones' 

-> • v Coincidence 1 

iist off fields number, R?END,L,D,F,Z,R,C,ENTRY, s, DESCRIPTOR, LENGTH, INIT, I 

*5ST PRINT OFF 
CLOSE DATA3ASES 
ERASE TEMPI .DBF 
ERASE TEI-JP2 ♦ DBF 
USE TEMPDESIG 



6 8 



WO 95/20681 



PCT/US95/01160 



* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USE TEMPI 
COUNT TO TOT 

REPLACE ALL RFBND WITH 1 

MARKI = 1 

SW2«0 

DO WHILE SW2-0 ROLL 
I? MARK1 >= TOT 
PACK 

COUNT TO UNIQUE 

SW2=1 

LOOP 

ENDIF 
GO MARKI 
DUP = 1 

STORE ENTRY TO TESTA 
SW B 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO TESTS 

IF TESTA = TESTS 

DELETE 

DUP * DUP+1 

LOOP 
■ END IP 
GO MARKI 

REPLACE RFEND WITH DUP 
MARKI = &ARX1+DUP 

LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
*BRCWSE 

■*SET PRINTER ON 

SORT ON DATE TO TEMP2 

USE TEMP2 

?? STR (UNIQUE, 4,0) 

?? • genes, for a total of 1 

?? STR{TOT,4,0} 

77 ■' clones 1 

? 

? ' V Coincidence' 

COUNT TO P4 FOR I«4 

IF P4>0 

? STR(P4,3,0) 

?? ' genes with priority = 4 (Secondary analysis;) 1 

list off fields number, RFEND, L, D, F, Z , R, C, WWX , S< DESCRIPTOR, LENGTH , 1NIT for 1*4 
*> 

* 

2NDI? 

COUNT TO ?3 FOR 1*3 

IF P3>0 

? STR(?3,3,0) 

?? ■ genes with priority = 3 (Full insert sequence:) ' 

list off fields number, RFSI^D* L, D;Fi S^P./C ^ E^7^Ry / S ^ DESCRIPTOR/ LEKGTK, INIT for 1=3 
ENDXF 

COUNT TO P2 FOR 1=2. 

IF F2>0 

? STR(P2,3,0) 

?? ' genes with priority = 2 {Primary analysis complete; ) 1 

list Off fields number, RFEND/ L, D,F,Z,R,C,ES , 7TRY, 8, DESCRIPTOR* LENGTH, INIT for 1=2 

* 

ENDIF 

COUNT TO PI FOR 1=1 
IF P1>0 
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7 STR(Pl,3,0) 

?? ' genes winh priority - 1 (Primary analysis neededs ) * 

list off fields number, R?E^,L,D,F,z,R,c^ for 1=1 

ENDIF 



*S2T PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI. DBF 
ERASE TEMP 2 . DBF 

USE "SmarnGuy:FoxBASE+/Mac:fox files i clones. dbf 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PF^AAMS 

USE TEMPI 

COUNT TO TOT 

REPLACE ALL RFEMD WITH 1 

MARK1 = 1 

SW2=0 

DO WHILE SW2»0 ROLL 
IF MAKK1 >s TOT 
PACK 

COUNT TO UNIQUE 

SW2=1 

LOOP 

ENDIF 
GO MARK1 
DUP S! 1 

STORE ENTRY TO TESTA 
SW - 0 

DO WHILE 5W«0 TEST 
SKIP 

STORE ENTRY TO TESTS 

IF TESTA * TESTS 

DELETE 

DUP - DUP-f 1 

LOOP 

ENDIF 
GO MARK! 

REPLACE RFEND WITH DUP 
MARK1 = MARKI-pDUP 
SW=1 
LOOP 

END DO TEST 
LOOP 

ENDDO ROLL 
♦BROWSE 

*SET PRINTER ON 

SORT ON NUMBER TO TEMP 2 

USE TEMP2 

?? STR (UNIQUE, 4,0) 

?? • genes, for a total of 1 

?? STR (TOT, 5,0) 

?? ' clones' 

? ' V Coincidence 1 

list off fields nurnber,RFStro,L,D ( F,Z,R,C,SN^ 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI. DBF 
ERASE TEMP2.DBF 

USE "SmartGuy jFoxBASEr/K^G:fox files;clones.dbf " 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USE TEMPI 

COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARK1 * 1 

SW2=0 

DO WHILE SW2=0 ROLL 
IP MARK1 >s TOT 
PACK 

COUNT TO UNIQUE 

COUNT TO NEW3ENE3 FOR Efe'H 1 .OR.D='0' 

SW2*1 

LOOP 

GO MARK1 
CUP » 1 

STORE. ENTRY TO TESTA 

sw e b 

DO WHILE SW=Q TEST 
SKIP 

STORE ENTRY TO TEST3 
IF TESTA = TESTB 
DELETE 
DUP ~ DQP+1 
LOOP 
ENDIF 

go'marki* 

replace rfend with dup 

MARK1 « KARKl+DUP 

SW=1 

LOOP 

EKDDO TEST 
LOOP 

ENDDO ROLL 
GO TOP 

STORE R TO FUNC 
USE "Analysis function. dbf M 
LOCATE FOR P=FUNC 
■REPLACE CLONES WITH TOT 
REPLACE GENES WITH UNIQUE 
REPLACE NEW WITH NEWGENES- 
USE TEMPI 

SORT ON RFEND/D TO TEMP2 

USE TEMP2 

SET HEADTtfG ON 

?? STR (UNIQUE, 5,0) 

?? ' genes, for a total of ' 

11 STTUTOT, 5,0) 

?? ' clones* 
*** 

? • V Coincidence' 

list' off fields number, RFQC>, L, D, F, Z, R, C, EtTC1lY,S # DESCRIPTOR, LENGTH, INIT, 1 

* SCREEN" 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Geneva ",13 COLOR 0,0, 
*liSt Cff fielda RFESJD,S, DESCRIPTOR 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEKP1 .DBF 
ERASE TEM2*DBF 
USE TEMPDESIG 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USE TEMPI 
COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARK1 = 1 

SW2-0 

DO WHILE SW2=0 ROLL 
IF MARXl >= TOT 
PACK 

COUNT TO UNIQUE 
SW2«1 
LOOP 
ENDIF 
GO MARK1 

DUP a 1 

STORE ENTRY TO TESTA 
£W e 0 

DO WHILE SW-0 TEST 
SKIP 

STORE ENTRY TO TSSTB 

IF TESTA « TESTS 

DELE7TE 

DUP = DUP+1 

LOOP 

ENDIF 
GO MARK1 

REPLACE RFEND WITH DUP 
MARK1 = MARK1+DUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
GO TOP 

STORE F TO DIST 

USE « Analysis distribution, dbf" 
LOCATE FOR ?*DI5T 
REPLACE CLONES WITH TOT 
REPLACE GENES WITH UNIQUE 
USE TEMPI 

sort on rfeind/d to TEMP2 

USE TEMP2 

?? STR(UNIQUB, 5, 0) 

7? • genes, for a total of 1 

7? STR(TOT,5,Q) 

77 ' clones' 

7 • v Coincidence' 

list off fields nuates:, RFEND, L,D, F, 2, R, C,EJPraY,S, DESCRIPTOR, LBNGM, INIT, I 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI* DBF 
ERASE TSMF2.DBF 
USE TEMPDESIG 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 

USB TEMPI 

COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARK1 * 1 

SW2«0 

DO V/HILE SW2«Q ROLL 
IF MARK1 >« TOT 
PACK 

COUNT TO UNIQUE 

LOOP 

ENDIF 
GO MARKl 
DUP a 1 

STORE ENTRY TO TESTA 
SWfl 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO TESTS 

IF TESTA = TESTE 

DELETE 

DUP .= DUP+1 

LOOP 

ENDIF 
GO MARK1 

REPLACE -RFEND WITH CUP 
MARK1 * MARK1+DUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL ' 

GO TO? 

USE TEMPI 

?? STR (UNIQUE, 5,0) 

?? ' genes, for a total of ' 

?? STR (TOT, 5,0) 

?? ' clones' 

? ' V Coincidence' 

list Off fields number, RFEND, L, D, ?, Z, R, C, ENTRY, S, DESCRIPTOR, LEMSTW, INIT, I 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI. DBF 
USE TEMPDESIG 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USE M SmartGuy;Fo>cBA5S+/Mac: zox f lies : Clones .dbf" 
COPY TO TSMPl FOR 
USE TSMPl 

COUNT TO IDGENE FOR D= ' E * . OR. D* ' 0 > ,OR.D= f H » .CR.D= 1 N' . OR. D= ' R 1 . OR. D* 1 A 1 

DELETE FOR D-'*3' .OR.D='D« .OR.D^'A 1 .OR.Ds'U' .OR-Dh'S' .QR.D='M' .OR*D*'R' .OR.D^'V* 

PACK 

COUNT TO TOT 

REPLACE ALL RFEND WITH I 

MARK! = I 

SW2sQ 

DO WHILE £W2=0 ROLL 
I? MARK! >= TOT 
PACK 

COUNT TO UNIQUE 

SW2*1 

LOO? 

ENDIF 
GO MARX1 
DUP b 1 

STORE ENTRY TO TESTA 
SW m 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO TESTB 

IF TESTA = TESTB 

DELETE* 

DUP . DUP+1 

LOOP 

ENDIF 
GO MARK1 

REPLACE RFEND WITH DUP 
MARKI = MARKl+DUP 
SW-1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
* BROWSE 

*SET PRINTER ON 

SORT ON RFEND/D, NUMBER TO TE>., 
USE TEMP2 

REPLACE ALL START WITH RFEND/IDGENE*10000 

?? STR (UNIQUE, 5,0) 

?? ' genes, for a total of * 

7? £TR(TOT, 5, 0) 

?? ' clones' 

? • Coincidence V V Clones/10000 1 

set heading off 

SCREEN 1 TYPE 0 HEADING •Screen 1- AT 40,2 SIZE 236,492 PIXELS FOOT ■Geneva", 7 COLOR 0,0,0, 
list fields number, RFEND, start, l, d,f,z, r,c, entry, s, descriptor, init, i 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI. DBF 
ERASE TEMP2.DBF 

USE *SmartGuy:FoxBASE+/Mac:fox f iles: clones. dbf" 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USE TEMPI 

count to idgsne for d«'3' .or.ds'o' .qr^'h 1 .or.ds'n' • or ,d='r' . or . d= 1 a ' 

delete for d-*n' . or. d^ ' d* . or. or ' a * . or , d= 'u ' .or.d^'s' .or.d*'m' .or.ds'r' .or.da'v 

pack 

COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARKl = 1 ■ 

SW2=0 

DO WHILE SW2=0 ROLL 
IF MARKl >= TOT 
PACK 

COUNT TO UNIQUE 
SW2=1 

ENDIF 
GO MARKl 
DOP * 1 

STORE ENTRY TO TEST A 
SW " 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO TESTE 

IF TESTA = TESTS 

DELETE 

OTP * DU?+1 

LOOP - 

ENDIF 
GO MARK1 

REPLACE RFEND WITH DOT 
MARK1 « MARX1+DUP 
SW«1 
LOOP 

ENTJDO TEST 
LOOP 

END DO ROLL 
*BR0WSE 

*SET PRIOTER ON 

SORT ON RFEND/D, NUMBER TO TZMP2 
USE TEMP2 

REPLACE ALL START WITH RFEND/IWEWE* 10000 

?? STR (UNIQUE, 5, 0} 

?? 1 genes, for a tdcal of ' 

?? STR(TOT,5,0} 

?? ' clones' 

? • Coincidence V V Clones/20000* 

set heading oft 

SCREEN 1 TYPE 0 HEADING "Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT ' Geneva 7 COLOR 0,0,0, 

list fields nuxnber, RFEND, START, l,d,?, z ;r,C, ENTRY, S, DESCRIPTOR ; INIT, I 

*SET PRINT OFF 

CLOSE DATA3ASES 

ERASE TEMPI. DBF 

ERASE TEMP2 . DBF 

USE B &nartGuy:FoxBASE+/Mac) fox f iles t clones .dbf 
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USE TEMPI 
COUNT TO TOT 
?? ' Total gf ■ 
?? STR(TOT,4,0) 
?? ' clones' 
? 

*liflfc Off fields number , L,D,F,2,R,C f ENTRY , DESCRIPTOR, LENGTH , REEND f XNIT , I 
list off fields number, L, D,F, Z,K, C, ENTRY, DESCRIPTOR 
CLOSE DATABASES 
ERASE -TEMPI, DBF 
USE TEMPDESIG 
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*Lif escan menu; version 8-7-$ 4 
SET TALK OFF 

set device to screen 
CLEAR 

USE "SmartC3uy:FoxBASE+/Mac:fox files: clones .dbf 
STORE LUFDATE ( ) TO Update 
GO BOTTOM 

STORE RECNOO TO cloneno 
STORE 6 TO Chooser 
DO WHILE .T. 

* Program.: Lifaseq menu.fmt 

* Date. . > . 1/11/95 

* Version.: FoxEASE+/Mac, revision 1.10 

* Notes. * . • ; Format file Lifesaq menu 
* 

SCREEN 1 TYPE 0 HEADING -Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Geneva" , 268 COLOR 0,0, 
G PIXELS 13,126 TO 77,365 STYLE 28479 COLOR 32767,-25600,-1,-16223,-16721,-15725 
@ PIXELS 110,29 TO 188,217 STYLE 3871 COLOR 0,0,-1,-25600,-1,-1 

£ PIXELS 45,161 SAY "LIFESEQ* STYLE 63536 FONT 'Geneva" , 53 6 COLOR 0,0,-1,-1,7135,5884 

& PIXELS 36,269 SAY *TM" STYLE 65536 FONT -Geneva • , 12 COLOR 0,0,-1,-1,7135,5884 

Q PIXELS 63,143 SAY "Molecular Biology Desktop* STYLE 65536 FONT "Helvetica" , 18 COLOR 0,0,0, 

$ PIXELS 90,252 TO 251,467 STYLE 28447 COLOR 0,0,-1,-25600,-1,-1 

8 PIXELS 117,270 GET Chooser STYLE 65536 FONT 'Chicago', 12 PICTUR3 B <3*RV Transcript profiles 
<? PIXELS 135,128 SAY Update STYLE 0 FONT 'Geneva', 12 SIZE 15,73 COLOR 0,0,0,-25600,-1,-1 

(3 PIXELS 171,128 SAY cloneno STYLE 0 FOOT 1 Geneva ',12 SIZE 15,79 COLOR 0,0,0,-25600,-1,-1 
(3 PIXELS 135,44 SAY "Last update*" STYLE 65536 FONT "Geneva*, 12 COLOR 0,0,-1,-1,-1,-1 
@ PIXELS 171,44 SAY "Total Clones:" STYLE 6553 6 FONT "Geneva", 12 COLOR 0, 0, -1, -1, -1, -1 

9 PIXELS 43,296 SAY -vl.30" STYLE 65536 FONT 'Geneva", 782 COLOR 0,0,-1,-1,-1,-1 

* 

* EOF: Lifeseq menu.fint 
READ 

DO CASE 

CASE Chooeers»l 

DO "SmartGuyiFox3A£E+/Mac:£ox files : Output programs {Master analysis 3,prg« 
CASE Chooser -2 

DO "SmarcGuyiFox3ASE47Mae:fox files: Output programs : Subtract icn 2.prg" 

' case Chocser=3 

DO " SxiartGuy i FoxHASE+ /Mac: fox files ; Output programs : Northern (single) .prg" 

CASE Chooser=4 

USE "Libraries.dbf - 

BROWSE 

CASE Che o s er — 5 

DO *SroaxtGuy:FoxEASE+/Mac:fox fiies;Ou-put programs i See individual clone. prg" 
CASE Chooser— 6 

DO *StriaxtGuyiFoxBASE+/Mac:fox files j Libraries i Output programs : Menu. prg" 

CASE Ch00S«r=7 

CLEAR 

SCREEN 1 OFF 

RETURN 

ENDCASE 

LOOP 
ENDDO 
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$1,30 SAY "Database Subset Analysis" STYLE $5536 FONT "Geneva" ,274 COLOR 0,0,0,-1,-1,-1 

7 ■ 

7 

7 

? dateO 

77 TIMBO 

7 'Clone numbers * 
?? STR ( INITIATE ,6,0) 
?? 1 through % 

?? STR (TERMINATE, 6,0) 

? 'Libraries ; ' 

IP ENTIRE*! 

? 'All libraries' 

ENDIF 

IF ENTIR£=2 
MASKol 
DO WHILE .T. 
IF MARK>STO?IT 
EXIT 
ENDIF 

USE SELECTED 
GO M£RK 
7 ' « 

7? TRlMdibnante) 
STORE MARKtl TO MARK 
LOOP 
ENDDO 
ENDIF 

? 'Designations i ' 

IF Ematch=0 .AUD. Hmatch=0 .AND. Cmatch«0 

?? 'All' 

ENDIF 

IF Einaech*! 
?? 'Exact,' 
ENDIF 

IF Kmatch=l 
?? 'Huinan, ' 
ENDIF 

IF Gmatch»l 
?? 'Other sp. 1 
ENDIF 

IF CONDEK=l 

? 'Condensed format analysis 1 

ENDIF 

IF ANAL=1 

?• 'Sorted by numelk' 

ENDIF 

IF ANAL=2 

? 'Sorted by ENTRY' 

ENDIF 

IF ANAL =3 

? ' Arranged by ABUNDANCE 1 

ENDIF 

IF ANAL=4 

? 'Sorted by INTEREST ' 

ENDIF 

IF AMAL=5 

? 'Arranged by LOCATION' 

ENDIF 

IF ANAL* 5 

? 'Arranged by DISTRIBUTION 1 

ENDIF 

IF ANAL =7 

? 'Arranged by FUNCTION' 



WO 95/20681 



PCT/US95/01160 



? '"lotial clones represented; ' 
?? STR (STARTOT, 6,0) 

? 'Total clones analyzed 1 

?? STR<M3ALT0T\ 6,0) 

"3 

t 

? 
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USE TEMPI 

COUNT TO TOT 

?? ' Total of 

?? STK(TOT,4,0) 

?? ' clones* 
? 

*list Ciff fields number ,L,DfF,Z*RiC, EOTRY , DESCRIPTOR , LENGTH , RFEND , INIT , I 
list off fields number, L,D,F, a, R,c,E2JTRy f DESCRIPTOR 

CLOSE DATABASES 
ERASE TEMPI . DBF 
USE TEMPDESIG 



8 1 



WO 95/20681 



PCT/US95/01160 



USE TEMPI 
COUNT TO TOT 
?? ' Total of 
?? STR{TOT,4,0) 
?? ' clones! 

*list off fields number, L,D,F,Z,R,C, ENTRY, D2SCRIPTOR, I£KGTH, RFEND, INIT, X 
list off fields number, L, D,F, Z ( REENTRY, DESCRIPTOR 
CLOSE DATABASES 
ERASE TEMPI, DB? 
USE TEMPDESIG 
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♦Northern (single) , version 11-25-94 

close databases 

SET TALK OFF 

SET PRINT OFF 

SET EXACT 0?? 

CLEAR 

STORE 1 1 TO Eobject 

STORE 1 ' TO Dobject 

STORE 0 TO Numb 
STORE 0 TO Zog 
STORE 1 TO Bail 
CO WHILE T. 

* program.: Northern (single) . rmt 

* Date : 8/ 

* Version,: Fo>cBASE*/Mac, revision 1.10 

* Notes....; Format file Northern (single) 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 S12E 286,492 PIXELS FONT "Geneva", 12 COLOR 0,0,0 
9 PIXELS 15, SI TO 46,39? STVXS 28447 COLOR 0,0,-1,-25600,-1,-1 
® PIXELS 89,79 TO 192,422 STYLE 28447 COLOR 0,0,0,-25600,-1,-1 

$ PIXEL3 115,98 SAY "Entry #:" STYLE 65536 FONT "Geneva", 12 COLOR 0,0,0,-1,-1,-1 

@ PIXELS 115,173 GET Eobject STYLE 0 FONT "Geneva" , 12 SIZE 15,142 COLOR 0,0,0,-1,-1,-1 

<3 PIXELS 145,89 SAY "Description" STYLE 65536 FONT "Geneva M2 COLOR 0,0,0,-1,-1,-1 

<a PIXELS 145,173 GET Dobject STYLE 0 FONT "Geneva rt , 12 SIZE 15,241 COLOR 0,0,0,-1,-1,-1 

<3 PIXELS 35,8$ SAY "Single Northern search screen" STYLE 65536 FOOT -Geneva" , 274 COLOR 0,0,- 

0 PIXELS 220,162 GET Bail STYLE 65536 FONT "Chicago ", 12 PICTURE "3*R Continue;Bail out" SIZE 

3 PIXELS 175,98 SAY "Clone STYLE 65536 FONT "Geneva", 12 COLOR 0,0,0,-1,-1,-1 , 

@ PIXELS 175,173 GET Numb STYLE 0 FONT "Geneva ",12 SIZE 15,70 COLOR 0,0,0,-1,-1,-1 

a PIXELS 80,152 SAY "Enter any ONE of the following:- STYLE 65536 FONT "Geneva", 12 COLOR -1, 

* EOF: Northern (single) « fmt 

READ 

IF Bail.=2 
CLEAR 

screen 1 off 

RETURN" 

ENDIF 

USE *SffartGuy:FoxBASE+/Mac:Fox files ; Lookup «dbf" 
SET TALK 'QW 

IF Eobjecto* . ' 

STORE UPPER (Eobject) to Eobject 

SET SAFETY OFF 

SORT ON Entry TO "Lookup entry. dbf 

SET SAFETY ON 

USE "Lookup entry, dbf 

LOCATE FOR Look=Eobject 

IF .NOT. FOUND {) 

CLEAR 

LOOP 

ENDIF 

BROWSE 

STORE Entry TO Searchyal 

CLOSE DATABASES 

ERASE "Lookup 'entry. dbf" 

ENDIF 

IF Dobjecto 1 1 
SET EXACT OFF 
SET SAFETY OFF 

SORT ON descriptor TO "Lookup descriptor. dbf" 

SET SAFETY On 

USE -Lookup descriptor. dbf H 

LOCATE FOR UPPER < trim ( descr i pt or )) =U?PER£ TRIM (Dobject) > 

IF .NOT.FOUNDO 

CLEAR 
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LOOP 

ENDIF 

BROWSE 

STORE En cry TO Searchval 

CLOSE DATABASES 

ERASE 1 Lookup descriptor .dbf" 

SET EXACT CN 

ENDIF 

IF N\rtib<>0 

USE , StnartGuy:PoxBASE+/MactFo5€ files: clones .dbf • 
GO Nuisb 

BROWSE 

STORE Entry TO Searchval 
ENDIF 

CLEAR 

? ' Northern analysis for entry ' 
?? Searchval 

o 

? 'Enter Y to proceed 1 

WAIT TO OK 

CLEAR 

IP UPF2R(QK)<>'Y' 
screen 1 off 
RETURN 
EKDIF * 

* COMPRESSION SUBROUTINE FOR Library, dbf 
? 'Compressing the Libraries file now, . . ' 

USE ■ SmartGuy : FoxBASE+ /Mac : Fox files: libraries. dbf" 
SET SAFETY OFF 

SORT ON library TO "Compressed libraries. dbf " 

* FOR enter ed>0 
SET SAFETY ON 

USE "Cojnpr eased libraries . dbf " 

DELETE FOR entered" 0 

PACK 

COUNT TO TOT* 
MARK1 * 1 
SW2»0 

DO WHILE SW2=0 ROLL 

IF MARK1 TOT 

PACK 

SW2 = 1 

LOOP 

ENDI? 
GO MARK! 

STORE library TO TESTA 
SKIP 

STORE Library TO TESTB 
IF TESTA - TESTB 
DELETE 
EKDIF 

MARXl = MARK1+1 
LOOP 

ENDDO ROLL 

* Northern analysis 
CLEAR 

? ' Doing the northern now. « . ' 
SET TALK ON 

USE "SmartGir/:Fox3ASE+/Mac:Fox files j clones .dbf " 
SET SAFETY OFF 

COPY TO "Hits. dbf ■ FOR entry*searehval 
SET SAFETY ON 
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CLOSE DATABASES 
SELECT 1 

USE "Compressed libraries. dbf 

STORE RECOOUNTO TO Entries 

SELECT 2 

USE "Hits.dbf" 

Mark*l 

DO WHILE .T. 

SELECT 1 

IF Mark>Entries 

EXIT 

ENDIF 

GO MARK 

STORE library TO Jigger 
SELECT 2 

COUNT TO Zcg FOR library™ Jigger 
SELECT 1 

REPLACE hits with Zag 

Mark=Mark+l 

LOOP 

ENDDO * 

S FIELDS LIBRARY, LIBNAME, EKTER2D, HITS AT 0,0 
CLEAR 

? 'Enter Y to print: ' 

WAIT TO FRINSET 

IP UPPER { FKXNS2T ) = ' Y 1 

SET PRINT ON 
CLEAR 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 $122 286, 432 PIXELS PONT "Geneva M4 COLOR 0,0/0 
? 'DATABASE ENTRIES MATCH IK0 EbTTRY ' 
?? Searchval 
? DATE ( ) 

gCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Geneva" ,7 COIOR 0,0,0, 
LIST OFF FIELDS library , libnaroe, entered, hits 

* 

* 

SELECT 2 

LIST OFF FIELDS NUMBER, LIBRARY, D, S,F, Z, R, SOTRY, DESCRIPTOR, R? START, START, RFEND 
SET TALK OFF 
SET PRINT OFF 
EMDIF 

CLOSE DATABASES 
SET TALK OFF 
CLEAR 

DO "Test print .prg" 
RETURN 
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TABLE 6 



H brary 

ADENINB01 

ADRENOR01 

ADRENOT01 

AMLBNOT01 

BWARNOT02 

CAHDNOT01 

CHAONOT01 

COHNNOTQ1 

BBRAGT01 

FJBRAGTC2 

REnANTOI 

F!3nNGT01 

RSRNGTC2 

FfBRNOTOI 

RBRNOTOa 

HMC1NOT01 

HUVELPB01 

HUVENOB01 

HUVESTB01 

HYPONOS01 

KIDNNOT01 

UVRNOT01 

LUNGNOT01 

MUSCNOT01 

OVlDNOaot 

PAMCNOT01 

PfTUNOROI 

P(TUNOT01 

PLACNOS01 

SlrVTNOT02 

SPLNFET01 

SPLNNOT02 

STOMNOT01 

6YNORAB01 

TELVNOTD1 

TESTNOT01 

THP1NOB01 

THP1PE801 

THP1PLB01 

U937NOT01 



I tbnam e 
inflamed adenoid 
Adrenal gland (r) 
Adrenal gland (T) 
AML blast cells fT) 
Bone marrow 
Bone marrow (T) 
Cardiac muscle (T) 
Chla hamster ovary 
Corneal stroma 
Fibroblast, AT 5 
Fibroblast, AT 30 
Fibroblast, AT 
Fibroblast, uv 5 
Fibroblast, uv 30 
Fibroblast 
Fibroblast, normal 
Mast cell Una HMC-1 
HUVEC IFNTNr.LFS 
HUVEC control 
HUVEC shear stress 
Hypothalamus 
Kidney (T) 
Liver (T) 
Lung (T) 

Skeletal muscle (T) 
Oviduct 

Pancreas, normal 
Pituitary (r) 
Pituitary (T) 
Ptaeenta 

Small intestine fT) 
Spleen-fliver, feto! 
Spleen fT) 
Stomach 

Rheum, synovium 
T + B rymphobiast 
Testis (T) 
THP-1 control 
THP phorbol 
THP-1 phorbol LPS 
U937, monocytic leak 



numborti brary 


d 


£ 


f 2 r 


entry 


descriptor 


r f e t a m a t a n 


rf e n d 


2304 


U937NOT0t 


E 


H 


COT 


HUMEF1B 


EJongatlon lador 1-beta 


0- 


0 


773 


3240 


HMC1NOT01 


£ 


H C C T 


HUMEF1B 


Elongation factor 1-beta 


0 


370 


773 


3259 


HMC1NOT01 


£ 


H 


OCT 


HUMEFlB 


Eionga.ion (actor 1-beta 


0 


371 


773 


4€93 


HMC1NOT01 


E 


H 


OCT 


HUME FIB 


elongation factor 1-beta 


0 


470 


773 


33£9 


HMC1NOT01 


E 


H 


C C T 


HUMEFlB 


Elongation factor i-beia 


0 


327 


773 


5139 


HMC1NOT01 


E 


H 


C C T 


HUMEF1B 


Elongation factor i-beta 


0 


375 


773 
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WHAT IS CLAIMED IS; 

1. A method of analyzing a specimen containing gene 
transcripts, said method comprising the steps of: 

(a) producing a library of biological sequences; 
5 (b) generating a ~2t of transcript sequences, where 

each of the transcript sequences in said set is indicative 
of a different one of the biological sequences of the 
library; 

(c) processing the transcript sequences in a 

10 programmed computer in which a database of reference 

transcript sequences indicative of reference biological 
sequences is stored, to generate an identified sequence 
value for each of the transcript sequences, where each said 
identified sequence value is indicative of a sequence 

15 annotation and a degree of match between one of the 

transcript sequences and at least one of the reference 
transcript sequences; and 

(d) processing each said identified sequence value to 
generate final data values indicative of a number of times 

20 each identified sequence value is present in the library • 



2. The method of claim 1, wherein step (a) includes 
the steps of : 

obtaining a mixture of mRNA; 

making cDNA copies of the mRNA; 
25 isolating a representative population of clones 

transfected with the cDNA and producing therefrom the 
library of biological sequences. 

3* The method of claim 1, wherein the bioxogical 
sequences are cDNA sequences. 

30 4, The method of claim 1, wherein the biological 

sequences are RNA sequences. 

5. The method of claim 1, wherein the biological 
sequences are protein sequences. 



87 



WO 95/20681 



PCT/US95/01160 



6. The method of claim 1, wherein a first value of 
said degree of match is indicative of an exact match, and a 
second value of said degree of match is indicative of a 
non-exact match ♦ 

5 7. A method of comparing two specimens containing 

gene transcripts , said method comprising: 

(a) analyzing a first specimen according to the 
method of claim 1/ 

(b) producing a second library of biological 
10 sequences; 

(c) generating a second set of transcript sequences, 
where each of the transcript sequences in said second set 
is indicative of a different one of the biological 
sequences of the second library; 

15 - (d) processing the second set of transcript sequences 

in said programmed computer to generate a second set of 
identified sequence values known as further identified 
sequence values, where each of the further identified 
sequence values is indicative of a sequence annotation and 

20 a degree of match between one of the biological sequences 
of the second library and at least one of the reference 
sequences; 

(e) processing each said further identified sequence 
value to generate further final data values indicative of a 

25 number of times each further identified sequence value is 
present in the second library; and 

(f) processing the final data values from the first 
specimen and the further identified sequence values from 
the second specimen to generate ratios of transcript 

30 sequences, each of said ratio values indicative of 

differences in numbers of gene transcripts between the two 
specimens ♦ 

8. A method of quantifying relative abundance of mRNA 
in a biological specimen, said method comprising the steps 
35 of: 

(a) isolating a population of mRNA transcripts from 
the biological specimen; 
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(b) identifying genes from which the mRNA was 
transcribed by a sequence-specific method; 

(c) determining numbers of mRNA transcripts 
corresponding to each of the genes; and 

5 (d) using the mRNA transcript numbers to determine 

the relative abundance of mRNA transcripts within the 
population of mRNA transcripts. 

9. A diagnostic method which comprises producing a 
gene transcript image, said method comprising the steps of: 

10 ( a ) isolating a population of mRNA transcripts from a 

biological specimen; 

(b) identifying genes from which the mRNA was 
transcribed by a sequence-specific method; 

(c) determining numbers of mRNA transcripts 
15 corresponding to each of the genes; and 

(d) using the mRNA transcript numbers to determine 
the relative abundance of mRNA transcripts within the 
population of mRNA transcripts, where data determining the 
relative abundance values of mRNA transcripts is the gene 

20 transcript image of the biological specimen. 

10. The method of claim 9, further comprising: 

(e) providing a set of standard normal and diseased 
gene transcript images; and 

(f) comparing the gene transcript image of the 

25 biological specimen with the gene transcript images of step 
(e) to identify at least one of the standard gene 
transcript images which most closely approximate the gene 
transcript image of the biological specimen. 

11. The method of claim 9, wherein the biological 
30 specimen is biopsy tissue, sputum, blood or urine. 

12. A method of producing a gene transcript image, 
said method comprising the steps of 

(a) obtaining a mixture of mRNA; 

(b) making cDNA copies of the mRNA; 
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(c) inserting the cDNA into a suitable vector and 
using said vector to transfect suitable host strain cells 
which are plated out and permitted to grow into clones, 
each clone representing a unique mRNA; 
5 (d) isolating a representative population of 

recombinant clones; 

(e) identifying amplified cDNAs from each clone in 
the population by a sequence-specific method which 
identifies gene from which the unique mRNA was transcribed; 
10 (f) determining a number of times each gene is 

represented within the population of clones as an 
indication of relative abundance; and 

(g) listing the genes and their relative abundance in 
order of abundance, thereby producing the gene transcript 
15 image. 



13. The method of claim 12, also including the step 
of diagnosing disease by: 

repeating steps (a) through (g) on biological 
specimens from random sample of normal and diseased humans, 
20 encompassing a variety of diseases, to produce reference 
sets of normal and diseased gene transcript images; 

obtaining a test specimen from a human, and producing 
a test gene transcript image by performing steps (a) 
through (g) on said test specimen; 
25 comparing the test gene transcript image with the 

reference sets of gene transcript images; and 

identifying at least one of the reference gene 
transcript images which most closely approximates the test 
gene transcript image. 

30 14 . A computer system for analyzing a library of 

biological sequences, said system including: 

means for receiving a set of transcript sequences, 
where each of the transcript sequences is indicative of a 
different one of the biological sequences of the library; 

3 5 and 

means for processing the transcript sequences in the 
computer system in which a database of reference transcript 
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sequences indicative of reference biological sequences is 
stored, wherein the computer is programmed with software 
for generating an identified sequence value for each of the 
transcript sequences, where each said identified sequence 
5 value is indicative of a sequence annotation and a degree 
of match between a different one of the biological 
sequences of the library and at least one of the reference 
transcript sequences, and for processing each said 
identified sequence value to generate final data values 
10 indicative of a number of times each identified sequence 
value is present in the library. 

15. The system of claim 14, also including: 
library generation means for producing the library of 
biological sequences and generating said set of transcript 
15 sequences from said library. 



16, The system of claim 15, wherein the library 
generation means includes: 

means for obtaining a mixture of mRNA; 

means for making cDNA copies of the mRNA; 
20 means for inserting the cDNA copies into cells and 

permitting the cells to grow into clones; 

means for isolating a representative population of the 
clones and producing therefrom the library of biological 
sequences. 
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COMPARATIVE GEKE TRANSCRIPT ANALYSIS 



1. FIELD OF INVENTION 

The present invention is in the field of molecular 
biology and computer science; more particularly, the 
5 present invention describes methods of analyzing gene 

transcripts and diagnosing the genetic expression of cells 
and tissue, 

2 . BACKGROUND OF THE INVENTION 

Until very recently, the history of molecular biology 
10 has been written one gene at a time* Scientists have 
observed the cell's physical changes, isolated mixtures 
from the cell or its milieu, purified proteins, sequenced 
proteins and thccfrom constructed probes to look for the 

corresponding gene, 

15 Recently, different nations have set up massive 

projects to sequence the billions of bases in the human 
genome. These projects typically begin with dividing the 
genome into large portions of chromosomes and then 
determining the sequences of these pieces, which are then 

20 analyzed for identity with known proteins or portions 

thereof, known as motifs. Unfortunately, the majority of 
genomic DNA does not encode proteins and though it is 
postulated to have some effect on the cell's ability to 
make protein, its relevance to medical applications is not 

25 understood at this time. 

A third methodology involves sequencing only the 
transcripts encoding the cellular machinery actively 
involved in making protein, namely the mRNA . The advantage 
is that the cell has already edited out all tne non-coding 

30 DNA, and it is relatively easy to identify the protein- 
coding portion of the RNA. The utility of this approach 
was not immediately obvious to genomic researchers. In 
fact, when cDNA sequencing was initially proposed, the 
method was roundly denounced by those committed to genomic 

35 sequencing. For example, the head of the U.S. Human Genome 
project discounted CDNA sequencing as not valuable and 
refused to approve funding of projects. 

In this disclosure, we teach methods for analyzing 
DNA, including cDNA libraries. Based on our analyses and 
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research, we see each individual gene product as a "pixel 11 
of information, which relates to the expression of that, 
and only that, gene. We teach herein, methods whereby the 
individual "pixels" of gene expression information can be 
5 combined into a single gene transcript "image," in which 
each of the individual genes can be visualized 
simultaneously and allowing relationships between the gene 
pixels to be easily visualized and understood. 

We further teach a new method which we call electronic 
10 subtraction. Electronic subtraction will enable the gene 
researcher to turn a single image into a moving picture, 
one which describes the temporality or dynamics of gene 
expression, at the level of a cell or a whole tissue. It 
is that sense of "motion" of cellular machinery on the 
15 scale of a cell or organ which constitutes the new 

invention herein. This constitutes a new view into the 
process of living cell physiology and one which holds great 
$ promise to unveil and discover new therapeutic and 

,i diagnostic approaches in medicine. 

20 We teach another method which we call "electronic 

northern," which tracks the expression of a single gene 
across many types of cells and tissues. 

Nucleic acids (DNA and RNA) carry within their 
sequence the hereditary information and are therefore the 
25 prime molecules of life. Nucleic acids are found in all 

living organisms including bacteria, fungi, viruses, plants 
and animals. It is of interest to determine the relative 
abundance of different discrete nucleic acids in different 
cells, tissues and organisms over time under various 
30 conditions, treatment^ and regimes. 

All dividing cells in the human body contain the same 
set of 23 pairs of chromosomes. It is estimated that these 
% autosomal and sex chromosomes encode approximately 100,000 

genes. The differences among different types of cells are 
35 believed to reflect the differential expression of the 
100,000 or so genes. Fundamental questions of biology 
could be answered by understanding which genes are 
transcribed and knowing the relative abundance of 
transcripts in different cells. 
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Previously, the art has only provided for the analysis 
of a few known genes at a time by standard molecular 
biology techniques such as PCR, northern blot analysis, or 
other types of DNA probe analysis such as In situ 
5 hybridization. Each of these methods allows one to analyze 
the transcription of only known genes and/or small numbers 
of genes at a time. Nucl. Acids Res. 19, 7097-7104 (1991); 
Nucl. Acids Res. 18, 4833-42 (1990); Nucl. Acids Res. 18, 
2789-92 (1989); European J. Neur ©science 2, 1063-1073 

10 (1990); Analytical Biochem. 187 , 364-73 (1990); Genet. 
Annals Techn. Appl. 7, 64-70 (1990); GATA 8(4), 129-33 
(1991); Proc. Natl. Acad. Sci. USA 85, 1696-1700 (1988); 
Nucl. Acids Res. 19, 1954 (1991); Proc. Natl. Acad. Sci. 
USA 88/ 1943-47 (1991); Nucl. Acids Res. 19, 6123-27 

15 (1991); Proc. Natl. Acad. Sci. USA 85, 5738-42 (1988); 
Nucl. Acids Res. 16, 10937 (1988). 

Studies of the number and types of genes whose 
transcription is induced or otherwise regulated during cell 
processes such as activation, differentiation, aging, viral 

20 transformation, morphogenesis, and mitosis have been 

pursued for many years, using a variety of methodologies. 
One of the earliest methods was to isolate and analyze 
levels of the proteins in a cell, tissue, organ system, or 
even organisms both before and after the process of 

25 interest. One method of analyzing multiple proteins in a 
sample is using 2-dimensional gel electrophoresis, wherein 
proteins can be, in principle, identified and quantified as 
individual bands, and ultimately reduced to a discrete 
signal. At present, 2-dimensional analysis only resolves 

30 approximately 15% of the proteins. In order to positively 
analyze those bands which are resolved, each band must be 
excised from the membrane and subjected to protein sequence 
analysis using Edman degradation. Unfortunately, most of 
the bands were present in quantities too small to obtain a 

35 reliable sequence, and many of those bands contained more 
than one discrete protein. An additional difficulty is 
that many of the proteins were blocked at the 
amino-terminus, further complicating the sequencing 
process . 
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Analyzing differentiation at the gene transcription 
level has overcome many of these disadvantages and 
drawbacks, since the power of recombinant DNA technology 
allows amplification of signals containing very small 
5 amounts of material. The most common method, called 
"hybridization subtraction," involves isolation of mRNA 
from the biological specimen before (B) and after (A) the 
developmental process of interest, transcribing one set of 
mRNA into cDNA, subtracting specimen B from specimen A 
10 (mRNA from cDNA) by hybridization, and constructing a cDNA 
library from the non-hybridizing mRNA fraction. Many 
different groups have used this strategy successfully, and 
a variety of procedures have been published and improved 
upon using this same basic scheme. Nucl. Acids Res. 19, 
15 7097-7104 (1991); Nucl. Acids Res. 18, 4833-42 (1990); 
Nucl. Acids Res. 18, 2789-92 (1989); European J. 
Neuroscience 2, 1063-1073 (1990); Analytical Biochem. 187, 
364-73 (1990); Genet. Annals Techn. Appl . 2, 64-70 (1990); 
GATA8.(4), 129-33 (1991); Proc. Natl. Acad. Sci. USA £5, 
20 1696-1700 (1988); Nucl. Acids Res. 19, 1954 (1991); Proc. 
Natl. Acad. Sci. USA 88, 1943-47 (1991); Nucl. Acids Res. 
19, 6123-27 (1991); Proc. Natl. Acad. Sci. USA 85, 5738-42 
(1988); Nucl. Acids Res. 16, 10937 (1988). 

Although each of these techniques have particular 
25 strengths and weaknesses, there are still some limitations 
and undesirable aspects of these methods: First, the time 
and effort required to construct such libraries is quite 
large. Typically, a trained molecular biologist might 
expect construction and characterization of such a library 
30 to require 3 to 6 months, depending on the level of skill, 
experience, and luck. Second, the resulting subtraction 
libraries are typically inferior to the libraries 
constructed by standard methodology. A typical 
conventional cDNA library should have a clone complexity of 
35 at least 10 6 clones, and an average insert size of 1-3 kB. 
In contrast, subtracted libraries can have complexities of 
10 2 or 10 3 and average insert sizes of 0.2 kB. Therefore, 
there can be a significant loss of clone and sequence 
information associated with such libraries. Third, this 
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approach allows the researcher to capture only the genes 
induced in specimen A relative to specimen B, not 
vice-versa, nor does it easily allow comparison to a third 
specimen of interest (C) . Fourth, this approach requires 
5 very large amounts (hundreds of micrograms) of "driver" 
mRNA (specimen B) , which significantly limits the number 
and type of subtractions that are possible since many 
tissues and cells are very difficult to obtain in large 
quantities . 

10 Fifth, the resolution of the subtraction is dependent 

upon the physical properties of DNA : DNA or RNA : DNA 
hybridization. The ability of a given sequence to find a 
hybridization match is dependent on its unique CoT value. 
The CoT value is a function of the number of copies 
15 (concentration) of the particular sequence, multiplied by 
the time of hybridization. It follows that for sequences 
which are abundant, hybridization events will occur very 
rapidly (low CoT value), while rare sequences will form 
duplexes at very high CoT values. CoT values which allow 
20 such rare sequences to form duplexes and therefore be 
effectively selected are difficult to achieve in a 
convenient time frame. Therefore, hybridization 
subtraction is simply not a useful technique with which to 
study relative levels of rare mRNA species. Sixth, this 
25 problem is further complicated by the fact tha+- duplex 
formation is also dependent on the nucleotide base 
composition for a given sequence. Those sequences rich in 
G + C form stronger duplexes than those with high contents 
of A + T. Therefore, the former sequences will tend to be 
30 removed selectively by hybridization subtraction. Seventh, 
it is possible that hybridization between nonexact matches 
can occur. When this happens, the expression of a 
homologous gene may "mask" expression of a gene of 
interest, artificially skewing the results for that 

35 particular gene. 

Matsubara and Okubo proposed using partial cDNA 
sequences to establish expression profiles of genes which 
could be used in functional analyses of the human genome. 
Matsubara and Okubo warned against using random priming, as 
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it creates multiple unique DNA fragments from individual 
mRNAs and may thus skew the analysis of the number of 
particular mRNAs per library. They sequenced randomly 
selected members from a 3 '-directed cDNA library and 
5 established the frequency of appearance of the various 

ESTs. They proposed comparing lists of ESTs from various 
cell types to classify genes. Genes expressed in many 
different cell types were labeled housekeepers and those 
selectively expressed in certain cells were labeled cell- 

10 specific genes, even in the absence of the full sequence of 
the gene or the biological activity of the gene product. 

The present invention avoids the drawbacks of the 
prior art by providing a method to quantify the relative 
abundance of multiple gene transcripts in a given 

15 biological specimen by the use of high-throughput 

sequence-specific analysis of individual RNAs and/or their 

corresponding cDNAs. 

The present invention offers several advantages over 
current protein discovery methods which attempt to isolate 

20 individual proteins based upon biological effects. The 
method of the instant invention provides for detailed 
diagnostic comparisons of cell profiles revealing numerous 
changes in the expression of individual transcripts. 

The instant invention provides several advantages over 

25 current subtraction methods including a more complex 
library analysis (10 6 to 10 7 clones as compared to 10 3 
clones) which allows identification of low abundance 
messages as well as enabling the identification of messages 
which either increase or decrease in abundance. These 

30 large libraries are very routine to make in contrast to the 
libraries of previous methods. In addition, homologues can 
easily be distinguished with the method of the instant 
invention . 

This method is very convenient because it organizes a 
35 large quantity of data into a comprehensible, digestible 
format. The most significant differences are highlighted 
by electronic subtraction. In depth analyses are made more 
convenient. 
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The present invention provides several advantages over 
previous methods of electronic analysis of cDNA. The 
method is particularly powerful when more than 100 and 
preferably more than 1,000 gene transcripts are analyzed. 
5 In such a case, new low-f reguency transcripts are 
discovered and tissue typed. 

High resolution analysis of gene expression can be 
used directly as a diagnostic profile or to identify 
disease-specific genes for the development of more classic 

10 diagnostic approaches. 

This process is defined as gene transcript frequency 
analysis. The resulting quantitative analysis of the gene 
transcripts is defined as comparative gene transcript 
analysis . 

15 3. SUMMARY OF THE INVENTION 

The invention is a method of analyzing a specimen 
containing gene transcripts comprising the steps of (a) 
producing a library of biological sequences; (b) generating 
a set of transcript sequences, where each of the transcript 
20 sequences in said set is indicative of a different one of 
the biological sequences of the library; (c) processing the 
transcript sequences in a programmed computer (in which a 
database of reference transcript sequences indicative of 
reference sequences is stored) , to generate an identified 
2 5 sequence value for each of the transcript sequences, where 
each said identified sequence value is indicative of 
sequence annotation and a degree of match between one of 
the biological sequences of the library and at least one of 
the reference sequences; and (d) processing each said 
30 identified sequence value to generate final data values 

indicative of the number of times each identified sequence 
value is present in the library. 

The invention also includes a method of comparing two 
specimens containing gene transcripts. The first specimen 
35 is processed as described above. The second specimen is 
used to produce a second library of biological sequences, 
which is used to generate a second set of transcript 
sequences, where each of the transcript sequences in the 
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second set is indicative of one of the biological sequences 
of the second library. Then the second set of transcript 
sequences is processed in a programmed computer to generate 
a second set of identified sequence values, namely the 
5 further identified sequence values, each of which is 

indicative of a sequence annotation and includes a degree 
of match between one of the biological sequences of the 
second library and at least one of the reference sequences. 
The further identified sequence values are processed to 
10 generate further final data values indicative of the number 
of times each further identified sequence value is present 
in the second library. The final data values from the 
first specimen and the further identified sequence values 
from the second specimen are processed to generate ratios 
15 of transcript sequences, which indicate the differences in 
the number of gene transcripts between the two specimens. 

in a further embodiment, the method includes 
quantifying the relative abundance of mRNA in a biological 
specimen by (a) isolating a population of mRNA transcripts 
20 from a biological specimen; (b) identifying genes from 
which the mRNA was transcribed by a sequence-specific 
method; (c) determining the numbers of mRNA transcripts 
corresponding to each of the genes; and (d) using the mRNA 
transcript numbers to determine the relative abundance of 
25 mRNA transcripts within the population of mRNA transcripts. 

Also disclosed is a method of producing a gene 
transcript image analysis by first obtaining a mixture of 
mRNA, from which cDNA copies are made. The cDNA is 
inserted into a suitable vector which is used to transfect 
30 suitable host strain cells which are plated out and 

permitted to grow into clones, each cone representing a 
unique mRNA. A representative population of clones 
transfected with cDNA is isolated. Each clone in the 
population is identified by a sequence-specific method 
35 which identifies the gene from which the unique mRNA was 
transcribed. The number of times each gene is identified 
to a clone is determined to evaluate gene transcript 
abundance. The genes and their abundances are listed in 
order of abundance to produce a gene transcript image. 
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, ^- +- rpiative abundance of the 

In a further embodiment, the relative 

gene transcripts in one cell type or tissue is 
with the relative abundance of gene transcript numbers in 
second cell type or tissue in order to identify the 
5 differences and similarities. cvcvt . etn 
in a further embodiment, the method includes a system 
for analyzing a library of biological sequences including a 
xneans for receiving a set of transcript sequences, where 
each of the transcript sequences is indicative of a 
10 different one of the biological sequences of the library, 
and a means for processing the transcript sequences in a 

■ w v-, =, rf S t a hase of reference transcript 
computer system m which a database ot r 

sequences indicative of reference sequences is stored, 
wherein the computer is programmed with software for 
15 generating an identified sequence value for each of the 
transcript sequences, where each said identified sequence 
value is indicative of a sequence annotation and the degree 

t . ^ a Hiffprent one of the biological 

of match between a dirteren^ uu** 

sequences of the library and at least one of the reference 

* * nr processing each said identified sequence 
20 sequences, and for processing 

value to generate final data values indicative of the 
number of times each identified sequence value is present 

in the library. 

in essence, the invention is a method and system for 

25 quantifying the relative abundance of gene transcripts in a 
biological specimen. The invention provides a method for 
comparing the gene transcript image from two or more 
different biological specimens in order to distinguish 
between the two specimens and identify one or more genes 

30 which are differentially expressed between the two 

• Thus this gene transcript image and its 

specimens. unus, uus y<=* 

comparison can be used as a diagnostic. One embodiment of 
the method generates high-throughput sequence-specif ic 
analysis of multiple RNAs or their corresponding cDNAs : a 

• v <1BanB Another embodiment of the method 
3 5 aene transcript image. Ano^nei. 

process the g ane transcript i»a g in g ™iy.i. by th. use of 
hig h-throu g „put cDHA stance analysis. In -*t»»J- 

^ . ranq rriDt images can be compared and used to 
or more gene transcript imay # 

* articular biological state, disease, 
detect or diagnose a particular ux y 



9 



WO 95/20681 



PCT/XJS95/01160 



or condition which is correlated to the relative abundance 
of gene transcripts in a given cell or population of cells. 

4. DESCRIPTION OF THE TABLES AND DRAWINGS 

4.1. TABLES 

5 Table 1 presents a detailed explanation of the letter 

codes utilized in Tables 2-5. 

Table 2 lists the one hundred most common gene 
transcripts. It is a partial list of isolates from the 
HUVEC cDNA library prepared and sequenced as described 

10 below. The left-hand column refers to the sequence's order 
of abundance in this table. The next column labeled 
"number" is the clone number of the first HUVEC sequence 
identification reference matching the sequence in the 
"entry" column number. Isolates that have not been 

15 sequenced are not present in Table 2. The next column, 

labeled M N", indicates the total number of cDNAs which have 
the same degree of match with the sequence of the reference 
transcript in the "entry" column. 

The column labeled "entry" gives the NIH GENBANK locus 

2 0 name, which corresponds to the library sequence numbers. 

The "s" column indicates in a few cases the species of the 
reference sequence. The code for column "s" is given in 
Table 1. The column labeled "descriptor" provides a plain 
English explanation of the identity of the sequence 

25 corresponding to the NIH GENBANK locus name in the "entry" 
column. 

Table 3 is a comparison of the top fifteen most 
abundant gene transcripts in normal monocytes and activated 
macrophage cells. 

30 Table 4 is a detailed summary of library subtraction 

analysis summary comparing the THP-1 and human macrophage 
cDNA sequences. In Table 4, the same code as in Table 2 is 
used. Additional columns are for "bgfreq" (abundance 
number in the subtractant library) , "rfend" (abundance 

35 number in the target library) and "ratio" (the target 
abundance number divided by the subtractant abundance 
number) . As is clear from perusal of the table, when the 
abundance number in the subtractant library is "0", the 



10 



PCTYUS95/01160 

WO 95/20681 

target abundance number is divided by 0.05. This is a way 
of obtaining a result (not possible dividing by 0) and 
distinguishing the result from ratios of subtractant 
numbers of 1. 

5 Table 5 is the computer program, written in source 

code, for generating gene transcript subtraction profiles. 

Table 6 is a partial listing of database entries used 
in the electronic northern blot analysis as provided by the 
present invention. 
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4.2 . BRIEF DESCRIPTION OF THE DRAWING S 

Figure 1 is a chart summarizing data collected and 
stored regarding the library construction portion of 
sequence preparation and analysis. 
15 Figure 2 is a diagram representing the sequence of 

operations performed by "abundance sort" software in a 
class of preferred embodiments of the inventive method. 

Figure 3 is a block diagram of a preferred embodiment 
of the system of the invention. 

Figure 4 is a more detailed block diagram of the 
bioinformatics process from new sequence (that has already 
been sequenced but not identified) to printout of the 
transcript imaging analysis and the provision of database 
subscriptions . 
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5. DETAILED DESCRIPTION OF THF . INVENTION 
The present invention provides a method to compare the 
relative abundance of gene transcripts in different 
biological specimens by the use of high-throughput 
sequence-specific analysis of individual RNAs or their 
corresponding cDNAs (or alternatively, of data representing 
other biological sequences) . This process is denoted 
herein as gene transcript imaging. The quantitative 
analysis of the relative abundance for a set of gene 
transcripts is denoted herein as "gene transcript image 
35 analysis" or "gene transcript frequency analysis". The 
present invention allows one to obtain a profile for gene 
transcription in any given population of cells or tissue 
from any type of organism. The invention can be applied to 
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obtain a profile of a specimen consisting of a single cell 
(or clones of a single cell), or of many cells, or of 
tissue more complex than a single cell and containing 
multiple cell types, such as liver. 
5 The invention has significant advantages in the fields 

of diagnostics, toxicology and pharmacology, to name a few. 
A highly sophisticated diagnostic test can be performed on 
the ill patient in whom a diagnosis has not been made. A 
biological specimen consisting of the patient's fluids or 

10 tissues is obtained, and the gene transcripts are isolated 
and expanded to the extent necessary to determine their 
identity. Optionally, the gene transcripts can be 
converted to cDNA. A sampling of the gene transcripts are 
subjected to sequence-specific analysis and quantified. 

15 These gene tran^-ript sequence abundances are compared 

against reference database sequence abundances including 
normal data sets for diseased and healthy patients. The 
patient has the disease (s) with which the patient's data 
set most closely correlates. 

20 For example, gene transcript frequency analysis can be 

used to differentiate normal cells or tissues from diseased 
cells or tissues, just as it highlights differences between 
normal monocytes and activated macrophages in Table 3. 

In toxicology, a fundamental question is which tests 

25 are most effective in predicting or detecting a toxic 

effect. Gene transcript imaging provides highly detailed 
information on the cell and tissue environment, some of 
which would not be obvious in conventional, less detailed 
screening methods. The gene transcript image is a more 

30 powerful method to predict drug toxicity and efficacy. 
Similar benefits accrue in the use of this tcol in 
pharmacology. The gene transcript image can be used 
selectively to look at protein categories which are 
expected to be affected, for example, enzymes which 

35 detoxify toxins. 

In an alternative embodiment, comparative gene 
transcript frequency analysis is used to differentiate 
between cancer cells which respond to anti-cancer agents 
and those which do not respond. Examples of anti-cancer 
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agents are tamoxifen, vincristine, vinblastine, 
podophyllotoxins, etoposide, tenisposide, cisplatin, 
biologic response modifiers such as interferon, 11-2, GM- 
CSF, enzymes, hormones and the like. This method also 
5 provides a means for sorting the gene transcripts by 
functional category. In the case of cancer cells, 
transcription factors or other essential regulatory 
molecules are very important categories to analyze across 
different libraries ♦ 

10 In yet another embodiment, comparative gene transcript 

frequency analysis is used to differentiate between control 
liver cells and liver cells isolated from patients treated 
with experimental drugs like FIAU to distinguish between 
pathology caused by the underlying disease and that caused 

15 by the drug. 

In yet another embodiment, comparative gene transcript 
frequency analysis is used to differentiate between brain 
tissue from patients treated and untreated with lithium. 

In a further embodiment, comparative gene transcript 

20 frequency analysis is used to differentiate between 
cyclosporin and FK506-treated cells and normal cells. 

In a further embodiment, comparative gene transcript 
frequency analysis is used to differentiate between virally 
infected (including HIV-infected) human cells and 

25 uninfected human cells. Gene transcript frequency analysis 
is also used to rapidly survey gene transcripts in HIV- 
resistant, HIV-infected, and HIV-sensitive cells. 
Comparison of gene transcript abundance will indicate the 
success of treatment and/or new avenues to study* 

30 In a further embodiment, comparative gene transcript 

frequency analysis is used to differentiate between 
bronchial lavage fluids from healthy and unhealthy patients 
with a variety of ailments* 

In a further embodiment, comparative gene transcript 

35 frequency analysis is used to differentiate between cell, 
plant, microbial and animal mutants and wild-type species. 
In addition, the transcript abundance program is adapted to 
permit the scientist to evaluate the transcription of one 
gene in many different tissues. Such comparisons could 



13 



WO95/20681 PCTAJS95/01160 

identify deletion mutants which do not produce a gene 
product and point mutants which produce a less abundant or 
otherwise different message. Such mutations can affect 
basic biochemical and pharmacological processes, such as 
5 mineral nutrition and metabolism, and can be isolated by 
means known to those skilled in the art. Thus, crops with 
improved yields, pest resistance and other factors can be 
developed. 

In a further embodiment, comparative gene transcript 
10 frequency analysis is used for an interspecies comparative 
analysis which would allow for the selection of better 
pharmacologic animal models. In this embodiment, humans 
and other animals (such as a mouse) , or their cultured 
cells are treated with a specific test agent. The relative 
15 sequence abundance of each cDNA population is determined. 

If the animal test system is a good model, homologous genes 
in the animal cDNA population should change expression 
similarly to those in human cells. If side effects are 
detected with the drug, a detailed transcript abundance 
20 analysis will be performed to survey gene transcript 

changes. Models will then be evaluated by comparing basic 

physiological changes. 

In a further embodiment, comparative gene transcript 
frequency analysis is used in a clinical setting to give a 

25 highly detailed gene transcript profile of a patient's 
cells or tissue (for example, a blood sample). In 
particular, gene transcript frequency analysis is used to 
give a high resolution gene expression profile of a 
diseased state or condition. 

30 in the preferred embodiment, the method utilizes 

high-throughput cDNA sequencing to identify specific 
transcripts of interest. The generated cDNA and deduced 
amino acid sequences are then extensively compared with 
GENBANK and other sequence data banks as described below. 

35 The method offers several advantages over current protein 
discovery by two-dimensional gel methods which try to 
identify individual proteins involved in a particular 
biological effect. Here, detailed comparisons of profiles 
of activated and inactive cells reveal numerous changes in 
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the expression of individual transcripts. After it is 
determined if the sequence is an "exact" match, similar or 
a non-match, the sequence is entered into a database. 
Next, the numbers of copies of cDNA corresponding to each 
5 gene are tabulated. Although this can be done slowly and 
arduously, if at all, by human hand from a printout of all 
entries, a computer program is a useful and rapid way to 
tabulate this information. The numbers of cDNA copies 
(optionally divided by the total number of sequences in the 
10 data set) provides a picture of the relative abundance of 
transcripts for each corresponding gene. The list of 
represented genes can then be sorted by abundance in the 
cDNA population. A multitude of additional types of 
comparisons or dimensions are possible and are exemplified 

15 below. 

An alternate method of producing a gene transcript 
image includes the steps of obtaining a mixture of test 
mRNA and providing a representative array of unique probes 
whose sequences are complementary to at least some of the 

20 test mRNAs . Next, a fixed amount of the test mRNA is added 
to the arrayed probes. The test mRNA is incubated with the 
probes for a sufficient time to allow hybrids of the test 
mRNA and probes to form. The mRNA-probe hybrids are 
detected and the quantity determined. The hybrids are 

25 identified by their location in the probe array. The 
quantity of each hybrid is summed to give a population 
number. Each hybrid quantity is divided by the population 
number to provide a set of relative abundance data termed a 
gene transcript image analysis. 

30 6 • EXAMPLES 

The examples below are provided to illustrate the 
subject invention. These examples are provided by way of 
illustration and are not included for the purpose of 
limiting the invention. 

5 6 .!. TISSUE SOURCES AND CELL LINES 

For analysis with the computer program claimed herein, 
biological sequences can be obtained from virtually any 
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source. Most popular are tissues obtained from the human 
body. Tissues can be obtained from any organ of the body, 
any age donor, any abnormality or any immortalized cell 
line. Immortal cell lines may be preferred in some 
5 instances because of their purity of cell type; other 
tissue samples invariably include mixed cell types, A 
special technique is available to take a single cell (for 
example, a brain cell) and harness the cellular machinery 
to grow up sufficient cDNA for sequencing by the techniques 

10 and analysis described herein (cf. U.S. Patent Nos. 
5,021,335 and 5,168,038, which are incorporated by 
reference) . The examples given herein utilized the 
following immortalized cell lines: monocyte-like U-937 
cells, activated macrophage-like THP-1 cells, induced 

15 vascular endothelial cells (HUVEC cells) and mast cell-like 
HMC-1 cells. 

The U-937 cell line is a human histiocytic lymphoma 
cell line with monocyte characteristics, established from 
malignant cells obtained from the pleural effusion of a 

20 patient with diffuse histiocytic lymphoma (Sundstrom, C. 
and Nilsson, K. (1976) Int. J. Cancer 17:565). U-937 is 
one of only a few human cell lines with the morphology, 
cytochemistry, surface receptors and monocyte-like 
characteristics of histiocytic cells. These cells can be 

25 induced to terminal monocytic differentiation and will 
express new cell surface molecules when activated with 
supernatants from human mixed lymphocyte cultures. Upon 
this type of in vitro activation, the cells undergo 
morphological and functional changes, including 

30 augmentation of antibody-dependent cellular cytotoxicity 

(ADCC) against erythroid and tumor target cells (one of the 
principal functions of macrophages) . Activation of U-937 
cells with phorbol 12-myristate 13-acetate (PMA) in vitro 
stimulates the production of several compounds, including 

35 prostaglandins, leukotrienes and platelet-activating factor 
(PAF) , which are potent inflammatory mediators. Thus, U- 
937 is a cell line that is well suited for the 
identification and isolation of gene transcripts associated 
with normal monocytes. 
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The HUVEC cell line is a normal, homogeneous, well 
characterized, early passage endothelial cell culture from 
human umbilical vein (Cell Systems Corp,, 12815 NE 124th 
Street, Kirkland, WA 98034). Only gene transcripts from 
5 induced, or treated, HUVEC cells were sequenced* One batch 
of 1 X 10 8 cells was treated for 5 hours with 1 U/ml rIL-lb 
and 100 ng/ml E . coli lipopolysaccharide (LPS) endotoxin 
prior to harvesting* A separate batch of 2 X 10 s cells was 
treated at confluence with 4 U/ml TNF and 2 U/ml 

10 interf eron-gamma (IFN-gamma) prior to harvesting. 

THP-1 is a human leukemic cell line with distinct 
monocytic characteristics. This cell line was derived from 
the blood of a 1-year-old boy with acute monocytic leukemia 
(Tsuchiya, S. et al. (1980) Int. J . Cancer: 171-76). The 

15 following cytol^ical and cytochemical criteria were used 
to determine the monocytic nature of the cell line: 1) the 
presence of alpha-naphthy 1 butyrate esterase activity which 
could be inhibited by sodium fluoride; 2) the production of 
lysozyme; 3) the phagocytosis of latex particles and 

20 sensitized SRBC (sheep red blood cells) ; and 4) the ability 
of mitomycin C-treated THP-1 cells to activate T- 
lymphocytes following ConA (concanavalin A) treatment. 
Morphologically, the cytoplasm contained small azurophilic 
granules and the nucleus was indented and irregularly 

25 shaped with deep folds. The cell line had Fc and C3b 
receptors, probably functioning in phagocytosis, THP-1 
cells treated with the tumor promoter 12-o-tetradecanoy 1- 
phorbol-13 acetate (TPA) stop proliferating and 
differentiate into macrophage-like cells which mimic native 

30 monocyte-derived macrophages in several respects. 

Morphologically, as the cells change shape, che nucleus 
becomes more irregular and additional phagocytic vacuoles 
appear in the cytoplasm. The differentiated THP-1 cells 
also exhibit an increased adherence to tissue culture 

35 plastic. 

HMC-l cells (a human mast cell line) were established 
from the peripheral blood of a Mayo Clinic patient with 
mast cell leukemia (Leukemia Res. (1988) 12:345-55). The 
cultured cells looked similar to immature cloned murine 
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mast cells, contained histamine, and stained positively for 
chloroacetate esterase, amino caproate esterase, eosinophil 
major basic protein (MBP) and tryptase. The HMC-1 cells 
have, however, lost the ability to synthesize normal IgE 
5 receptors. HMC-1 ceils also possess a 10;16 translocation, 
present in cells initially collected by leukophoresis from 
the patient and not an artifact of culturing. Thus, HMC-1 
cells are a good model for mast cells. 

6.2 • CONSTRUCTION OF cDNA LIBRARIES 

10 For inter-library comparisons, the libraries must be 

prepared in similar manners. Certain parameters appear to 
be particularly important to control. One such parameter 
is the method of isolating mRNA. It is important to use 
the same conditions to remove DNA and heterogeneous nuclear 
15 RNA from comparison libraries. Size fractionation of cDNA 
must be carefully controlled. The same vector preferably 
should be used for preparing libraries to be compared. At 
the very least, the same type of vector (e.g., 
unidirectional vector) should be used to assure a valid 
20 comparison. A unidirectional vector may be preferred in 
order to more easily analyze the output. 

It is preferred to prime only with oligo dT 
unidirectional primer in order to obtain one only clone per 
mRNA transcript when obtaining cDNAs . However, it is 
25 recognized that employing a mixture of oligo dT and random 
primers can also be advantageous because such a mixture 
results in more sequence diversity when gene discovery also 
is a goal. Similar effects can be obtained with DR2 
(Clontech) and HXLOX (US Biochemical) and also vectors from 
30 Invitrogen and Novagen. These vectors have two 

requirements. First, there must be primer sites for 
commercially available primers such as T3 or M13 reverse 
primers. Second, the vector must accept inserts up to 10 
kB. 

35 jt also is important that the clones be randomly 

sampled, and that a significant population of clones is 
used. Data have been generated with 5,000 clones; however, 
if very rare genes are to be obtained and/or their relative 
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abundance determined, as many as 100,000 clones from a 
single library may need to be sampled. Size fractionation 
of cDNA also must be carefully controlled. Alternately, 
plaques can be selected, rather than clones. 
5 Besides the Uni-ZAP™ vector system by Stratagene 

disclosed below, it is now believed that other similarly 
unidirectional vectors also can be used. For example, it 
is believed that such vectors include but are not limited 
to DR2 (Clontech), and HXLOX (U.S. Biochemical). 

10 Preferably, the details of library construction (as 

shown in Figure 1) are collected and stored in a database 
for later retrieval relative to the sequences being 
compared. Fig. 1 shows important information regarding the 
library collaborator or cell or cDNA supplier, 

15 pretreatment, biological source, culture, mRNA preparation 
and cDNA construction. Similarly detailed information 
about the other steps is beneficial in analyzing sequences 

and libraries in depth. 

RNA must be harvested from cells and tissue samples 

20 and cDNA libraries are subsequently constructed. cDNA 

libraries can be constructed according to techniques known 
in the art. (See, for example, Maniatis, T. et al. (1982) 
Molecular Cloning, Cold Spring Harbor Laboratory, New 
York). cDNA libraries may also be purchased. The U-937 

25 cDNA library (catalog No. 937207) was obtained from 

Stratagene, Inc., 11099 M. Torrey Pines Rd. , La Jolla, CA 
92037. 

The THP-1 cDNA library was custom constructed by 
Stratagene from THP-1 cells cultured 48 hours with 100 nm 

30 TPA and 4 hours with 1 jug/ml LPS. The human mast cell HMC- 
1 cDNA library was also custom constructed by Stratagene 
from cultured HMC-1 cells. The HUVEC cDNA library was 
custom constructed by Stratagene from two batches of 
induced HUVEC cells which were separately processed. 

35 Essentially, all the libraries were prepared in the 

same manner. First, poly(A+)RNA (mRNA) was purified. For 
the U-937 and HMC-1 RNA, cDNA synthesis was only primed 
with oligo dT. For the THP-1 and HUVEC RNA, cDNA synthesis 
was primed separately with both oligo dT and random 



19 



WO 95/20681 



PCIYUS9S/01160 



hexamers, and the two cDNA libraries were treated 
separately. Synthetic adaptor oligonucleotides were 
ligated onto cDNA ends enabling its insertion into the Uni- 
Zap™ vector system (Stratagene) , allowing high efficiency 
5 unidirectional (sense orientation) lambda library 

construction and the convenience of a plasmid system with 
blue-white color selection to detect clones with cDNA 
insertions. Finally, the two libraries were combined into 
a single library by mixing equal numbers of bacteriophage . 

10 The libraries can be screened with either DNA probes 

or antibody probes and the pBluescript® phagemid 
(Stratagene) can be rap„.^ly excised in vivo . The phagemid 
allows the use of a plasmid system for easy insert 
characterization , sequencing/ site-directed mutagenesis, 

15 the creation of unidirectional deletions and expression of 
fusion proteins. The custom-constructed library phage 
particles were infected into E . coli host strain XLl-Blue® 
(Stratagene) , which has a high transformation efficiency, 
increasing the probability of obtaining rare, under- 

20 represented clones in the cDNA library. 



6.3. ISOLATION OF cDNA CLONES 

The phagemid forms of individual cDNA clones were 
obtained by the in vivo excision process, in which the host 
bacterial strain was coinfected with both the lambda 

25 library phage and an fl helper phage. Proteins derived 

from both the library-containing phage and the helper phage 
nicked the lambda DNA, initiated new DNA synthesis from 
defined sequences on the lambda target DNA and created a 
smaller, single stranded circular phagemid DNA molecule 

30 that included all DNA sequences of the pBluescript® plasmid 
and the cDNA insert. The phagemid DNA was secreted from 
the cells and purified, then used to re-infect fresh host 
cells, where the double stranded phagemid DNA was produced. 
Because the phagemid carries the gene for beta-lactamase, 

35 the newly-transformed bacteria are selected on medium 
containing ampicillin . 

Phagemid DNA was purified using the Magic Minipreps™ 
DNA Purification System (Promega catalogue #A7100. Promega 
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Corp., 2800 Woods Hollow Rd., Madison, WI 53711). This 
small-scale process provides a simple and reliable method 
for lysing the bacterial cells and rapidly isolating 
purified phagemid DNA using a proprietary DNA-binding 
5 resin. The DNA was eluted from the purification resin 
already prepared for DNA sequencing and other analytical 

manipulations. 

Phagemid DNA was also purified using the QIAwell-8 
Plasmid Purification System from QIAGEN® DNA Purification 
10 System (QIAGEN Inc., 9259 Eton Ave., Chattsworth, CA 

91311). This product line provides a convenient, rapid and 
reliable high-throughput method for lysing the bacterial 
cells and isolating highly purified phagemid DNA using 
QIAGEN anion-exchange resin particles with EMPORE™ membrane 
15 technology from 3M in a multiwell format. The DNA was 

eluted from the purification resin already prepared for DNA 
sequencing and other analytical manipulations. 

An alternate method of purifying phagemid has recently 
become available. It utilizes the Miniprep Kit (Catalog 
20 No. 77468, available from Advanced Genetic Technologies 
Corp., 19212 Orbit Drive, Gaithersburg, Maryland). This 
kit is in the 96-well format and provides enough reagents 
for 960 purifications. Each kit is provided with a 
recommended protocol, which has been employed except for 
25 the following changes. First, the 96 wells are each filled 
with only 1 ml of sterile terrific broth with carbenicillin 
at 25 mg/L and glycerol at 0.4%. After the wells are 
inoculated, the bacteria are cultured for 24 hours and 
lysed with 60 fil of lysis buffer. A centrif ugation step 
30 (2900 rpm for 5 minutes) is performed before the contents 
of the block are added to the primary filter plate. The 
optional step of adding isopropanol to TRIS buffer is not 
routinely performed. After the last step in the protocol, 
samples are transferred to a Beckman 96-well block for 
3 5 storage. 

Another new DNA purification system is the WIZARD™ 
product line which is available from Promega (catalog No. 
A7071) and may be adaptable to the 96-well format. 
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6 • 4 . SEQUENCING OF CDNA CLONES 

The cDNA inserts from random isolates of the U-937 and 
THP-1 libraries were sequenced in part. Methods for DNA 
sequencing are well known in the art. Conventional 
5 enzymatic methods employ DNA polymerase Klenow fragment, 
Seguenase™ or Taq polymerase to extend DNA chains from an 
oligonucleotide primer annealed to the DNA template of 
interest. Methods have been developed for the use of both 
single- and double-stranded templates. The chain 

10 termination reaction products are usually electrophoresed 
on urea-acry lamide gels and are detected either by 
autoradiography (for radionuclide-labeled precursors) or by 
fluorescence (for fluorescent-labeled precursors) . Recent 
improvements in mechanized reaction preparation, sequencing 

15 and analysis using the fluorescent detection method have 

permitted expansion in the number of sequences that can be 
determined per day (such as the Applied Biosystems 373 and 
377 DNA sequencer, Catalyst 800) , Currently with the 
system as described, read lengths range from 250 to 400 

20 bases and are clone dependent. Read length also varies 
with the length of time the gel is run. In general, the 
shorter runs tend to truncate the sequence. A minimum of 
only about 25 to 50 bases is necessary to establish the 
identification and degree of homology of the sequence. 

25 Gene transcript imaging can be used with any sequence- 
specific method, including, but not limited to 
hybridization, mass spectroscopy, capillary electrophoresis 
and 505 gel electrophoresis. 



Using the nucleotide sequences derived from the cDNA 
clones as query sequences (sequences of a Sequence 
Listing) , databases containing previously identified 
sequences are searched for areas of homology (similarity) . 
35 Examples of such databases include Genbank and EMBL. We 
next describe examples of two homology search algorithms 
that can be used, and then describe the subsequent 
computer-implemented steps to be performed in accordance 
with preferred embodiments of the invention. 



6.5. 



HOMOLOGY SEARCHING OF cDNA CLONE AND 
DEDUCED PROTEIN (and Subsequent steps) 



30 
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In the following description of the computer- 
implemented steps of the invention, the word "library" 
denotes a set (or population) of biological specimen 
nucleic acid sequences. A "library" can consist of cDNA 
5 sequences, RNA sequences, or the like, which characterize a 
biological specimen. The biological specimen can consist 
of cells of a single human cell type (or can be any of the 
other above-mentioned types of specimens). We contemplate 
that the sequences in a library have been determined so as 
10 to accurately represent or characterize a biological 

specimen (for example, they can consist of representative 
cDNA sequences from clones of RNA taken from a single human 
cell) . 

In the following description of the computer- 
15 implemented steps of the invention, the expression 

"database" denotes a set of stored data which represent a 
collection of sequences, which in turn represent a 
collection of biological reference materials. For example, 
a database can consist of data representing many stored 
20 cDNA sequences which are in turn representative of human 
cells infected with various viruses, cells of humans of 
various ages, cells from different mammalian species, and 

so on. 

In preferred embodiments, the invention employs a 
25 computer programmed with software (to be described) for 
performing the following steps: 

(a) processing data indicative of a library of cDNA 
sequences (generated as a result of high-throughput cDNA 
sequencing or other method) to determine whether each 
30 sequence in the library matches a DNA sequence of a 

reference database of DNA sequences (and if so, identifying 
the reference database entry which matches the sequence and 
indicating the degree of match between the reference 
sequence and the library sequence) and assigning an 
35 identified sequence value based on the sequence annotation 
and degree of match to each of the sequences in the 
library; 

(b) for some or all entries of the database, 
tabulating the number of matching identified sequence 
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values in the library (Although this can be done by human 
hand from a printout of all entries, we prefer to perform 
this step using computer software to be described below.), 
thereby generating a set of final data values or "abundance 

5 numbers"; and 

(c) if the libraries are different sizes, dividing 
each abundance number by the total number of sequences in 
the library, to obtain a relative abundance number for each 
identified sequence value (i.e., a relative abundance of 
10 each gene transcript) . 

The list of identified sequence values (or genes 
corresponding thereto) can then be sorted by abundance in 
the cDNA population. A multitude of additional types of 
comparisons or dimensions are possible. 
15 For example (to be described below in greater detail) , 

steps (a) and (b) can be repeated for two different 
libraries (sometimes referred to as a "target" library and 
a "subtractant" library). Then, for each identified 
sequence value (or gene transcript) , a "ratio" value is 
20 obtained by dividing the abundance number (for that 

identified sequence value) for the target library, by the 
abundance number (for that identified sequence value) for 
the subtractant library. 

In fact, subtraction may be carried out on multiple 
25 libraries. It is possible to add the transcripts from 

several libraries (for example, three) and then to divide 
them by another set of transcripts from multiple libraries 
(again, for example, three). Notation for this operation 
may be abbreviated as (A+B+C) / (D+E+F) , where the capital 
30 letters each indicate an entire library. Optionally the 
abundance numbers of transcripts in the summed libraries 
may be divided by the total sample size before subtraction. 

Unlike standard hybridization technology which permits 
a single subtraction of two libraries, once one has 
35 processed a set or library transcript sequences and stored 
them in the computer, any number of subtractions can be 
performed on the library. For example, by this method, 
ratio values can be obtained by dividing relative abundance 
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values in a first library by corresponding values in a 
second library and vice versa. 

In variations on step (a) , the library consists of 
nucleotide sequences derived from cDNA clones. Examples of 
5 databases which can be searched for areas of homology 

(similarity) in step (a) include the commercially available 
databases known as Genbank (NIH) EMBL (European Molecular 
Biology Labs, Germany), and GENESEQ ( Intelligenetics , 
Mountain View, California) . 
10 one homology search algorithm which can be used to 

implement step (a) is the algorithm described in the paper 
by D.J. Lipman and W.R. Pearson, entitled "Rapid and 
Sensitive Protein Similarity Searches," Science, 227:1435 
(1985) . In this algorithm, the homologous regions are 
15 searched in a two-step manner. In the first step, the 

highest homologous regions are determined by calculating a 
matching score using a homology score table. The parameter 
« Ktup « is used in this step to establish the minimum window 
size to be shifted for comparing two sequences. Ktup also 
20 sets the number of bases that must match to extract the 
highest homologous region among the sequences. In this 
step, no insertions or deletions are applied and the 
homology is displayed as an initial (INIT) value. 

In the second step, the homologous regions are aligned 
25 to obtain the highest matching score by inserting a gap in 
order to add a probable deleted portion. The matching 
score obtained in the first step is recalculated using the 
homology score Table and the insertion score Table to an 
optimized (OPT) value in the final output, 
30 DNA homologies between two sequences can be examined 

graphically using the Harr method of constructing dot 
matrix homology plots (Needleman, S.B. and Wunsch, CO., L 
Mom. Biol 48:443 (1970)). This method produces a 
two-dimensional plot which can be useful in determining 
35 region, of homology versus regions of repetition. 

However, in a class of preferred embodiments, step (a) 
is implemented by processing the library data in the 
commercially available computer program known as the 
INHERIT 670 Sequence Analysis System, available from 
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Applied Biosystems Inc. (Foster City, California) , 
including the software known as the Factura software (also 
available from Applied Biosystems Inc.). The Factura 
program preprocesses each library sequence to "edit out" 
5 portions thereof which are not likely to be of interest, 

such as the vector used to prepare the library. Additional 
sequences which can be edited out or masked (ignored by the 
search tools) include but are not limited to the polyA tail 
and repetitive GAG and CCC sequences. A low-end search 

10 program can be written to mask out such "low-information" 
sequences, or programs such as BLAST can ignore the low- 
information sequences. 

In the algorithm implemented by the INHERIT 670 
Sequence Analysis System, the Pattern Specification 

15 Language (developed by TRW Inc.) is used to determine 
regions of homology. "There are three parameters that 
determine how INHERIT analysis runs sequence comparisons: 
window size, window offset and error tolerance. Window 
size specifies the length of the segments into which the 

20 query sequence is subdivided. Window offset specifies 

where to start the next segment [to be compared], counting 
from the beginning of the previous segment. Error 
tolerance specifies the total number of insertions, 
deletions and/or substitutions that are tolerated over the 

25 specified word length. Error tolerance may be set to any 
integer between 0 and 6. The default settings are window 
tolerance=20, window offset=10 and error tolerance=3 . " 
INHERIT Analysis Users Manual , pp. 2-15. Version 1.0, 
Applied Biosystems, Inc., October 1991. 

30 Using a combination of these three parameters, a 

database (such as a DNA database) can be searched for 
sequences containing regions of homology and the 
appropriate sequences are scored with an initial value. 
Subsequently, these homologous regions are examined using 

35 dot matrix homology plots to determine regions of homology 
versus regions of repetition . Smith-Waterman alignments 
can be used to display the results of the homology search. 
The INHERIT software can be executed by a Sun computer 
system programmed with the UNIX operating system. 
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Search alternatives to INHERIT include the BLAST 
program, GCG (available from the Genetics Computer Group, 
WI) and the Dasher program (Temple Smith, Boston 
University, Boston, MA) . Nucleotide sequences can be 
5 searched against Genbank, EMBL or custom databases such as 
GENESEQ (available from Intelligenetics, Mountain View, CA) 
or other databases for genes. In addition, we have 
searched some sequences against our own in-house database. 

In preferred embodiments, the transcript sequences are 
10 analyzed by the INHERIT software for best conformance with 
a reference gene transcript to assign a sequence identifier 
and assigned the degree of homology, which together are the 
identified sequence value and are input into, and further 
processed by, a Macintosh personal computer (available from 
15 Apple) programmed with an "abundance sort and subtraction 
analysis" computer program (to be described below) . 

Prior to the abundance sort and subtraction analysis 
program (also denoted as the "abundance sort" program) , 
identified sequences from the cDNA clones are assigned 
20 value (according to the parameters given above) by degree 
of match according to the following categories: "exact- 
matches (regions with a high degree of identity) , 
homologous human matches (regions of high similarity, but 
not "exact" matches) , homologous non-human matches (regions 
25 of high similarity present in species other than human) , or 
non matches (no significant regions of homology to 
previously identified nucleotide sequences stored in the 
form of the database). Alternately, the degree of match 
can be a numeric value as described below. 
30 With reference again to the step of identifying 

matches between reference sequences and database entries, 
protein and peptide sequences can be deduced from the 
nucleic acid sequences. Using the deduced polypeptide 
sequence, the match identification can be performed in a 
3 5 manner analogous to that done with cDNA sequences. A 

protein sequence is used as a query sequence and compared 
to the previously identified sequences contained in a 
database such as the Swiss/Prot, PIR and the NBRF Protein 
database to find homologous proteins. These proteins are 
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initially scored for homology using a homology score Table 
(Orcutt, B.C. and Dayoff, M.O. Scoring Matrices, PIR 
Report MAT - 0285 (February 1985)) resulting in an INIT 
score. The homologous regions are aligned to obtain the 
5 highest matching scores by inserting a gap which adds a 
probable deleted portion. The matching score is 
recalculated using the homology score Table and the 
insertion score Table resulting in an optimized (OPT) 
score. Even in the absence of knowledge of the proper 
10 reading frame of an isolated sequence, the above-described 
protein homology search may be performed by searching all 3 

reading frames. 

Peptide and protein sequence homologies can also be 
ascertained using the INHERIT 670 Sequence Analysis System 
15 in an analogous way to that used in DNA sequence 

homologies. Pattern Specification Language and parameter 
windows are used to search protein databases for sequences 
containing regions of homology which are scored with an 
initial value. Subsequent display in a dot-matrix homology 
20 plot shows regions of homology versus regions of 

repetition. Additional search tools that are available to 
use on pattern search databases include PLsearch Blocks 
(available from Henikoff & Henikoff, University of 
Washington, Seattle) , Dasher and GCG. Pattern search 
25 databases include, but are not limited to, Protein Blocks 
(available from Henikoff & Henikoff, University of 
Washington, Seattle), Brookhaven Protein (available from 
the Brookhaven National Laboratory, Brookhaven, MA), 
PROSITE (available from Amos Bairoch, University of Geneva, 
30 Switzerland) , ProDom (available from Temple Smith, Boston 
University) , and PROTEIN MOTIF FINGERPRINT (available from 
University of Leeds, United Kingdom). 

The ABI Assembler application software, part of the 
INHERIT DNA analysis system (available from Applied 
35 Biosystems, Inc., Foster City, CA) , can be employed to 

create and manage sequence assembly projects by assembling 
data from selected sequence fragments into a larger 
sequence. The Assembler software combines two advanced 
computer technologies which maximize the ability to 
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assemble sequenced DNA fragments into Assemblages, a 
special grouping of data where the relationships between 
sequences are shown by graphic overlap, alignment and 
statistical views. The process is based on the 
5 Meyers-Kececioglu model of fragment assembly (INHERIT'™ 
Assembler User's Manual, Applied Biosystems, Inc., Foster 
City, CA) , and uses graph theory as the foundation of a 
very rigorous multiple sequence alignment engine for 
assembling DNA sequence fragments. Other assembly programs 

10 that can be used include MEGALIGN (available from DNASTAR 
Inc., Madison, WI) , Dasher and STADEN (available from Roger 
Staden, Cambridge, England) . 

Next, with reference to Fig. 2, we describe in more 
detail the "abundance sort" program which implements above- 

15 mentioned "step (b) 11 to tabulate the number of sequences of 
the library which match each database entry (the "abundance 
number" for each database entry) . 

Fig. 2 is a flow chart of a preferred embodiment of 
the abundance sort program. A source code listing of this 

2 0 embodiment of the abundance sort program is set forth in 

Table 5. In the Table 5 implementation, the abundance sort 
program is written using the FoxBASE programming language 
commercially available from Microsoft Corporation. 
Although FoxBASE was the program chosen for the first 

25 iteration of this technology, it should not be considered 

limiting. Many other programming languages, Sybase being a 
particularly desirable alternative, can also be used, as 
will be obvious to one with ordinary skill in the art. The 
subroutine names specified in Fig. 2 correspond to 

30 subroutines listed in Table 5. 

With reference again to Fig. 2, the "Identified 
Sequences" are transcript sequences representing each 
sequence of the library and a corresponding identification 
of the database entry (if any) which it matches. In other 

35 words, the "Identified Sequences" are transcript sequences 
representing the output of above-discussed "step (a)." 

Fig. 3 is a block diagram of a system for implementing 
the invention. The Fig. 3 system includes library 
generation unit 2 which generates a library and asserts an 
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output stream of transcript sequences indicative of the 
biological sequences comprising the library. Programmed 
processor 4 receives the data stream output from unit 2 and 
processes this data in accordance with above-discussed 
5 "step (a) M to generate the Identified Sequences. Processor 
4 can be a processor programmed with the commercially 
available computer program known as the INHERIT 670 
Sequence Analysis System and the commercially available 
computer program known as the Fa:tura program (both 

10 available from Applied Biosystems Inc.) and with the UNIX 
operating system. 

Still with reference to Fig. 3, the Identified 
Sequences are loaded into processor 6 which is programmed 
with the abundance sort program. Processor 6 generates the 

15 Final Transcript sequences indicated in both Figs. 2 and 3. 
Fig. 4 shows a more detailed block diagram of a planned 
relational computer system, including various searching 
techniques which can be implemented, along with an 
assortment of databases to query against. 

20 With reference to Fig. 2, the abundance sort program 

first performs an operation known as "Tempnum" on the 
Identified Sequences, to discard all of the Identified 
Sequences except those which match database entries of 
selected types. For example, the Tempnum process can 

25 select Identified Sequences which represent matches of the 
following types with database entries (see above for 
definition): "exact" matches, human "homologous" matches, 
"other species" matches representing genes present in 
species other than human) , "no" matches (no significant 

30 regions of homology with database entries representing 
previously identified nucleotide sequences), "x" matches 
(Incyte for not previously known DNA sequences) , or "X" 
matches (matches ESTs in reference database) . This 
eliminates the U, S, M , V, A, R and D sequence (see Table 1 

35 for definitions) . 

The identified sequence values selected during the 
"Tempnum" process then undergo a further selection (weeding 
out) operation known as "Tempred." This operation can, for 
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example, discard all identified sequence values 
representing matches with selected database entries. 

The identified sequence values selected during the 
"Tempred" process are then classified according to library, 
5 during the "Tempdesig" operation. It is contemplated that 
the "Identified Sequences" can represent sequences from a 
single library, or from two or more libraries. 

Consider first the case that the identified sequence 
values represent sequences from a single library* In this 

10 case, all the identified sequence values determined during 
"Tempred" undergo sorting in the "Templib" operation, 
further sorting in the "Libsort" operation, and finally 
additional sorting in the "Temptarsort " operation. For 
example, these three sorting operations can sort the 

15 identified sequences in order of decreasing "abundance 
number" (to generate a list of decreasing abundance 
numbers, each abundance number corresponding to a unique 
identified sequence entry, or several lists of decreasing 
abundance numbers, with the abundance numbers in each list 

20 corresponding to database entries of a selected type) with 
redundancies eliminated from each sorted list. In this 
case, the operation identified as "Cruncher" can be 
bypassed, so that the "Final Data" values are the organized 
transcript sequences produced during the "Temptarsort" 

25 operation. 

We next consider the case that the transcript 
sequences produced during the "Tempred" operation represent 
sequences from two libraries (which we will denote the 
„ target n library and the "subtractant" library) . For 

30 example, the target library may consist of cDNA sequences 
from clones of a diseased cell, while the subtractant 
library may consist of cDNA sequences from clones of the 
diseased cell after treatment by exposure to a drug. For 
another example, the target library may consist of cDNA 

35 sequences from clones of a cell type from a young human, 

while the subtractant library may consist of cDNA sequences 
from clones of the same cell type from the same human at 
different ages. 
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In this case, the "Tempdesig" operation routes all 
transcript sequences representing the target library for 
processing in accordance with "Templib" (and then "Libsort" 
and "Temptarsort") , and routes all transcript sequences 
5 representing the subtractant library for processing in 
accordance with "Tempsub" (and then "Subsort" and 
"Tempsubsort") . For example, the consecutive "Temp lib," 
"Libsort," and "Temptarsort" sorting operations sort 
identified sequences from the target library in order of 

10 decreasing abundance number (to generate a list of 
decreasing abundance numbers, each abundance number 
corresponding to a database entry, or several lists of 
decreasing abundance numbers, with the abundance numbers in 
each list corresponding to database entries of a selected 

15 type) with redundancies eliminated from each sorted list. 
The consecutive "Tempsub, » "Subsort," and "Tempsubsort" 
sorting operations sort identified sequences from the 
subtractant library in order of decreasing abundance number 
(to generate a list of decreasing abundance numbers, each 

20 abundance number corresponding to a database entry, or 
several lists of decreasing abundance numbers, with the 
abundance numbers in each list corresponding to database 
entries of a selected type) with redundancies eliminated 

from each sorted list. 

25 The transcript sequences output from the "Temptarsort" 

operation typically represent sorted lists from which a 
histogram could be generated in which position along one 
(e.g., horizontal) axis indicates abundance number (of 
target library sequences) , and position along another 

30 (e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type). Similarly, the 
transcript sequences output from the "Tempsubsort" 
operation typically represent sorted lists from which a 
histogram could be generated in which position along one 

35 (e.g., horizontal) axis indicates abundance number (of 

subtractant library sequences) , and position along another 
(e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type). 
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The transcript sequences (sorted lists) output from 
the Tempsubsort and Teraptarsort sorting operations are 
combined during the operation identified as "Cruncher." 
The "Cruncher" process identifies pairs of corresponding 
5 target and subtractant abundance numbers (both representing 
the same identified sequence value), and divides one by the 
other to generate a "ratio" value for each pair of 
corresponding abundance numbers, and then sorts the ratio 
values in order of decreasing ratio value. The data output 

10 from the "Cruncher" operation (the Final Transcript 

sequence in Fig. 2) is typically a sorted list from which a 
histogram could be generated in which position along one 
axis indicates the size of a ratio of abundance numbers 
(for corresponding identified sequence values from target 

15 and subtractant I^nries) and position along another axis 
indicates identified sequence value (e.g., gene type). 

Preferably, prior to obtaining a ratio between the two 
library abundance values, the Cruncher operation also 
divides each ratio value by the total number of sequences 

20 in one or both of the target and subtractant libraries. 

The resulting lists of "relative" ratio values generated by 
the Cruncher operation are useful for many medical , 
scientific, and industrial applications. Also preferably, 
the output of the Cruncher operation is a set of lists, 

25 each list representing a sequence of decreasing ratio 
values for a different selected subset (e.g. protein 
family) of database entries. 

In one example, the abundance sort program of the 
invention tabulates for a library the numbers of mRNA 

30 transcripts corresponding to each gene identified in a 

database. These numbers are divided by the tctal number of 
clones sampled. The results of the division reflect the 
relative abundance of the mRNA transcripts in the cell type 
or tissue from which they were obtained. Obtaining this 

35 final data set is referred to herein as "gene transcript 
image analysis." The resulting subtracted data show 
exactly what proteins and genes are upregulated and 
downregulated in highly detailed complexity. 
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6.6. HUVEC CPNA LIBRARY 

Table 2 is an abundance table listing the various gene 
transcripts in an induced HUVEC library. The transcripts 
are listed in order of decreasing abundance. This 
5 computerized sorting simplifies analysis of the tissue and 
speeds identification of significant new proteins which are 
specific to this cell type. This type of endothelial cell 
lines tissues of the cardiovascular system, and the more 
that is known about its composition, particularly in 
10 response to activation, the more choices of protein targets 
become available to affect in treating disorders of this 
tissue, such as the highly prevalent atherosclerosis. 

6.7. MONOCYTE -CELL AND MAST-CELL cDNA LIBRARIES 

Tables 3 and 4 show truncated comparisons of two 

15 libraries. In Tables 3 and 4 the "normal monocytes'* are 
the HMC-1 cells, and the "activated macrophages" are the 
THP-1 cells pretreated with PMA and activated with LPS. 
Table 3 lists in descending order of abundance the most 
abundant gene transcripts for both cell types. With only 

20 15 gene transcripts from each cell type, this table permits 
quick, qualitative comparison of the most common 
transcripts. This abundance sort, with its convenient 
side-by-side display, provides an immediately useful 
research tool. In this example, this research tool 

25 discloses that 1) only one of the top 15 activated 
macrophage transcripts is found in the top 15 normal 
monocyte gene transcripts (poly A binding protein) ; and 2) 
a new gene transcript (previously unreported in other 
databases) is relatively highly represented in activated 

30 macrophages but is not similarly prominent in normal 

macrophages. Such a research tool provides researchers 
with a short-cut to new proteins, such as receptors, cell- 
surface and intracellular signalling molecules, which can 
serve as drug targets in commercial drug screening 

35 programs. Such a tool could save considerable time over 

that consumed by a hit and miss discovery program aimed at 
identifying important proteins in and around cells, because 
those proteins carrying out everyday cellular functions and 
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represented as steady state mRNA are quickly eliminated 
from further characterization. 

This illustrates how the gene transcript profiles 
change with altered cellular function. Those skilled in 
5 the art know that the biochemical composition of cells also 
changes with other functional changes such as cancer, 
including cancer's various stages, and exposure to 
toxicity, A gene transcript subtraction profile such as in 
Table 3 is useful as a first screening tool for such gene 
10 expression and protein studies. 

6.8. SUBTRACTION DIALYSIS OF NORMAL MONOCYTE-CELL AND 
ACTIVATED MONOCYTE CELL cDNA LIBRARIES 

Once the cDNA data are in the computer, the computer 
program as disclosed in Table 5 was used to obtain ratios 

15 of all the gene transcripts in the two libraries discussed 
in Example 6.7, and the gene transcripts were sorted by the 
descending values of their ratios. If a gene transcript is 
not represented in one library, that gene transcript's 
abundance is unknown but appears to be less than 1. As an 

20 approximation — and to obtain a ratio, which would not be 
possible if the unrepresented gene were given an abundance 
of zero genes which are represented in only one of the 
two libraries are assigned an abundance of 1/2. Using 1/2 
for unrepresented clones increases the relative importance 

25 of "turned-on" and "turned-off" genes, whose products would 
be drug candidates. The resulting print-out is called a 
subtraction table and is an extremely valuable screening 
method, as is shown by the following data. 

Table 4 is a subtraction table, in which the normal 

30 monocyte library was electronically "subtracts J" from the 
activated macrophage library. This table highlights most 
effectively the changes in abundance of the gene 
transcripts by activation of macrophages. Even among the 
first 20 gene transcripts listed, there are several unknown 

35 gene transcripts. Thus, electronic subtraction is a useful 
tool with which to assist researchers in identifying much 
more quickly the basic biochemical changes between two cell 
types. Such a tool can save universities and 
pharmaceutical companies which spend billions of dollars on 
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research valuable time and laboratory resources at the 
early discovery stage and can speed up the drug development 
cycle, which in turn permits researchers to set up drug 
screening programs much earlier. Thus, this research tool 
5 provides a way to get new drugs to the public faster and 
more economically. 

Also, such a subtraction table can be obtained for 
4 patient diagnosis. An individual patient sample (such as 

% monocytes obtained from a biopsy or blood sample) can be 

10 compared with data provided herein to diagnose conditions 
associated with macrophage activation. 

Table 4 uncovered ir.-ny new gene transcripts (labeled 
Incyte clones) . Note that many genes are turned on in the 
activated macrophage (i.e., the monocyte had a 0 in the 
15 bgfreg column) . This screening method is superior to other 
screening techniques, such as the western blot, which are 
incapable of uncovering such a multitude of discrete new 
§ gene transcripts. 

The subtraction-screening technique has also uncovered 
20 a high number of cancer gene transcripts (oncogenes rho, 
ETS2, rab-2 ras, YPTl-related , and acute myeloid leukemia 
mRNA) in the activated macrophage. These transcripts may 
be attributed to the use of immortalized cell lines and are 
inherently interesting for that reason. This screening 
25 technique offers a detailed picture of upregulated 

transcripts including oncogenes, which helps explain why 
anti-cancer drugs interfere with the patient's immunity 
mediated by activated macrophages. Armed with knowledge 
gained from this screening method, those skilled in the art 
30 can set up more targeted, more effective drug screening 
programs to identify drugs which are differentially 
effective against 1) both relevant cancers and activated 
macrophage conditions with the same gene transcript 
profile; 2) cancer alone; and 3) activated macrophage 
35 conditions. 

Smooth muscle senescent protein (22 kd) was 
upregulated in the activated macrophage, which indicates 
that it is a candidate to block in controlling 
inflammation . 
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6.9, SUBTRACTION ANALYSIS OF NORMAL LIVER CELLS AND 
HEPATITIS INFECTED LIVER CELL CDNA LIBRARIES 

In this example, rats are exposed to hepatitis virus 

and maintained in the colony until they show definite signs 

5 of hepatitis. Of the rats diagnosed with hepatitis, one 

half of the rats are treated with a new anti-hepatitis 

agent (AHA) . Liver samples are obtained from all rats 

before exposure to the hepatitis virus and at the end of 

AHA treatment or no treatment. In addition , liver samples 

10 can be obtained from rats with hepatitis just prior to AHA 

treatment . 

The liver tissue is treated as described in Examples 
6.2 and 6*3 to obtain mRNA and subsequently to sequence 
cDNA. The cDNA from each sample are processed and analyzed 

15 for abundance according to the computer program in Table 5. 
The resulting gene transcript images of the cDNA provide 
detailed pictures of the baseline (control) for each animal 
and of the infected and/or treated state of the animals. 
cDNA data for a group of samples can be combined into a 

20 group summary gene transcript profile for all control 

samples, all samples from infected rats and all samples 
from AHA-treated rats. 

Subtractions are performed between appropriate 
individual libraries and the grouped libraries. For 

25 individual animals, control and post-study samples can be 

subtracted. Also, if samples are obtained before and after 
AHA treatment, that data from individual animals and 
treatment groups can be subtracted. In addition, the data 
for all control samples can be pooled and averaged. The 

30 control average can be subtracted from averages of both 
post-study AHA and post-study non-AHA cDNA samples. If 
pre- and post-treatment samples are available, pre- and 
post-treatment samples can be compared individually (or 
electronically averaged) and subtracted* 

35 These subtraction tables are used in two general ways. 

First, the differences are analyzed for gene transcripts 
which are associated with continuing hepatic deterioration 
or healing. The subtraction tables are tools to isolate 
the effects of the drug treatment from the underlying basic 

40 pathology of hepatitis. Because hepatitis affects many 
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parameters, additional liver toxicity has been difficult to 
detect with only blood tests for the usual enzymes. The 
gene transcript profile and subtraction provides a much 
more complex biochemical picture which researchers have 
5 needed to analyze such difficult problems. 

Second, the subtraction tables provide a tool for 
identifying clinical markers, individual proteins or other 
biochemical determinants which are used to predict and/or 
evaluate a clinical endpoint, such as disease, improvement 

10 due to the drug, and even additional pathology due to the 
drug. The subtraction tables specifically highlight genes 
which are turned on or off. Thus, the subtraction tables 
provide a first screen for a set of gene transcript 
candidates for use as clinical markers. Subsequently, 

15 electronic subtractions of additional cell and tissue 

libraries reveal which of the potential markers are in fact 
found in different cell and tissue libraries. Candidate 
gene transcripts found in additional libraries are removed 
from the set of potential clinical markers. Then, tests of 

20 blood or other relevant samples which are known to lack and 
have the relevant condition are compared to validate the 
selection of the clinical marker. In this method, the 
particular physiologic function of the protein transcript 
need not be determined to qualify the gene transcript as a 

25 clinical marker. 

6.10. ELECTRONIC NORTHERN BLOT 

One limitation of electronic subtraction is that it is 
difficult to compare more than a pair of images at once. 
Once particular individual gene products are identified as 

30 relevant to further s^udy (via electronic subtraction or 
other methods) , it is useful to study the expression of 
single genes in a multitude of different tissues. In the 
lab, the technique of "Northern" blot hybridization is used 
for this purpose. In this technique, a single cDNA, or a 

35 probe corresponding thereto, is labeled and then hybridized 
against a blot containing RNA samples prepared from a 
multitude of tissues or cell types. Upon autoradiography, 
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the pattern of expression of that particular gene, one at a 
time, can be quantitated in all the included samples. 

In contrast, a further embodiment of this invention is 
the computerized form of this process, termed here 
5 -electronic northern blot." In this variation, a single 
gene is queried for expression against a multitude of 
prepared and sequenced libraries present within the 
database. In this way, the pattern of expression of any 
single candidate gene can be examined instantaneously and 

10 effortlessly. More candidate genes can thus be scanned, 
leading to more frequent and fruitfully relevant 
discoveries. The computer program included as Table 5 
includes a program for performing this function, and Table 
6 is a partial listing of entries of the database used in 

15 the electronic northern blot analysis. 

6.11. PHASE I CT.TNTCAL TRIALS 

Based on the establishment of safety and effectiveness 
in the above animal tests, Phase I clinical tests are 
undertaken. Normal patients are subjected to the usual 
20 preliminary clinical laboratory tests. In addition, 
appropriate specimens are taken and subjected to gene 
transcript analysis. Additional patient specimens are 
taken at predetermined intervals during the test. The 
specimens are subjected to gene transcript analysis as 
25 described above. In addition, the gene transcript changes 
noted in the earlier rat toxicity study are carefully 
evaluated as clinical markers in the followed patients. 
Changes in the gene transcript analyses are evaluated as 
indicators of toxicitv by correlation with clinical signs 
30 and symptoms and other laboratory results. In addition, 
subtraction is performed on individual patient specimens 
and on averaged patient specimens. The subtraction 
analysis highlights any toxicological changes in the 
treated patients. This is a highly refined determinant of 
35 toxicity. The subtraction method also annotates clinical 
markers. Further subgroups can be analyzed by subtraction 
analysis, including, for example, 1) segregation by 
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occurrence and type of adverse effect; and 2) segregation 
by dosage. 

6.12. GENE TRANSCRIPT IMAGING ANA LYSIS IN CLINICAL STUDIES 

A gene transcript imaging analysis (or multiple gene 
5 transcript imaging analyses) is a useful tool in other 
clinical studies* For example, the differences in gene 
transcript imaging analyses before and after treatment can 
be assessed for patients on placebo and drug treatment. 
This method also effectively screens for clinical markers 
10 to follow in clinical use of the drug. 

6.13. COMPARATIVE GENE TRANSCRIPT ANALYSIS BET WEEN SPECIES 

The subtraction method can be used to screen cDNA 
libraries from diverse sources. For example, the same cell 
types from different species can be compared by gene 

15 transcript analysis to screen for specific differences, 
such as in detoxification enzyme systems. Such testing 
aids in the selection and validation of an animal model for 
the commercial purpose of drug screening or toxicological 
testing of drugs intended for human or animal use. When 

20 the comparison between animals of different species is 

shown in columns for each species, we refer to this as an 
interspecies comparison, or 200 blot. 

Embodiments of this invention may employ databases 
such as those written using the FoxBASE programming 

25 language commercially available from Microsoft Corporation. 
Other embodiments of the invention employ other databases, 
such as a random peptide database, a polymer database, a 
synthetic oligomer database, or a oligonucleotide database 
of the type described in U.S. Patent 5,270,170, issued 

30 December 14, 1993 to Cull, et al., PCT International 

Application Publication No. WO 9322684, published November 
11, 1993, PCT International Application Publication No. WO 
9306121, published April 1, 1993, or PCT International 
Application Publication No. WO 9119818, published December 

35 26, 1991. These four references (whose text is 

incorporated herein by reference) include teaching which 
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may be applied in implementing such other embodiments of 
the present invention . 

All references referred to in the preceding text are 
hereby expressly incorporated by reference herein. 
5 Various modifications and variations of the described 

method and system of the invention will be apparent to 
those skilled in the art without departing from the scope 
and spirit of the invention. Although the invention has 
been described in connection with specific preferred 
10 embodiments, it should be understood that the invention as 
claimed should not be unduly limited to such specific 
embodiments . 
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TABLE 2 



Clone numbers 15000 through 20000 

Libraries: HUVEC 

Arranged by ABUNDANCE 

Total clones analyzed: 5000 



319 


>* 

genes, for 


a total of 


1713 Clones 






number 


N c 


entry s 


descriptor 


1 


15365 


67 


HSRPL4 1 


Riboptn L41 


2 


15004 


65 


NCY0 15004 


INCYTE 015004 


3 


15638 


63 


NCY015638 


INCYTE 015638 


4 


i r i o A 

15390 


50 


NCYO 15 J90 


INCYTE OlbJyO 


5 


15193 


4 7 


HSFIB1 


Fibronectm 


6 


15220 


47 


RRRPL9 R 


Riboptn L9 


1 


15280 


47 


NCY01 5280 


INCYTE 015280 


8 


15583 


33 


M62060 


EST HHCH09 (IGR) 


9 


15662 


31 


HSACTCGR 


Actin, gamma . 


10 


15026 


29 


NCY015026 


INCYTE 015026 


11 


15279 


24 


HSEF1AR 


Elf 1-alpha 


12 


15027 


23 


NCY015027 


INCYTE 015027 


13 


15033 


20 


NCY015033 


INCYTE 015033 


14 


15198 


20 


NCY015198 


INCYTE 015198 


15 


15809 


20 


HS COLLI 


Collagenase 


16 


15221 


19 


NCY015221 


INCYTE 015221 


17 


15263 


19 


NCY015263 


INCYTE 015263 


18 


15290 


19 


NCY015290 


INCYTE 015290 


19 


15350 


18 


NCY015350 


INCYTE 015350 


20 


15030 


17 


NCY015030 


INCYTE 015030 


21 


15234 


17 


NCY015234 


INCYTE 015234 


22 


15459 


16 


NCY015459 


INCYTE 015459 


23 


1S353 


15 


NCY015353 


INCYTE 015353 


24 


15378 


15 


S76965 


Ptn kinase inhib 


25 


15255 


14 


HUMTHYB4 


Thymosin beta-4 


26 


15401 


14 


HSLIPCR 


Lipocortin I 


27 


15425 


14 


HSPOLYAB 


Poly-A bp 


28 


18212 


14 


HUMTHYMA 


Thymosin, alpha 


29 


18216 


14 


HSMRP1 


Motility relat ptn; MRP-1;CD 


30 


15189 


13 


HS18D 


Interferon indue ptn 1-8D 


31 


15031 


12 


HUMFKBP 


FK506 bp 


32 


15306 


12 


HSH2AZ 


Histone H2A 


33 


15621 


12 


HUMLEC 


Lectin, B-galbp r 14kDa 


34 


15789 


11 


NCY015789 


INCYTE 015789 


35 


16578 


11 


BSRPB 11 


Riboptn Sll 


36 


16632 


11 


M61984 


EST HHCA13 (IGR) 


37 


18314 


11 


NCYO^.3314 


INCYTE 018314 


38 


15367 


10 


NCY015367 


INCYTE 015367 


39 


15415 


10 


HSIFNIN1 


interferon indue mRNA 


40 


15633 


10 


HSLDHAR 


Lactate dehydrogenase 


41 


15813 


10 


CHKNMHCB 


C Myosin heavy chain B 


42 


18210 


10 


NCY018210 


INCYTE 018210 


43 


1B233 


10 


HSRPII140 


RNA polymerase II 


44 


18996 


10 


NCY018996 


INCYTE 018996 


45 


15088 


9 


HUMFERL 


Ferritin, light chain 


46 


15714 


9 


NCY015714 


INCYTE 015714 


47 


15720 


9 


NCY015720 


INCYTE 015720 


48 


15863 


9 


NCY015863 


INCYTE 015863 


49 


16121 


9 


HSET 


Endothelin 


50 


18252 


9 


NCY018252 


INCYTE 018252 


51 


15351 


8 


HUKALBP 


Lipid bp, adipocyte 


52 


15370 


8 


NCY015370 


INCYTE 015370 
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TABLE 2 Con't 





number" 


N 


c 


entry 


5 *} 


15670 


8 




BTCIASHI 


5 <i 


1 B795 


8 




NCY015795 




1 6245 


8 
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5 


18262 


8 




NCY018262 


57 
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5 ft 
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5 Q 
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7 
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7 
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7 




NCY016646 


c c 
ob 


1BUU J 


7 




HUMPAIA 


o / 




6 




HUMUB 


o o 


1DZD / 


6 




HSRPS8 




1 c n q c 






NCY015295 


/ u 


1 C/ICQ 


6 




RNRPS10R 


*7 1 


1 CQTO 
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*7 Q 


X D*i / D 






NCY01 5475 


q n 

CJU 


1 c 7 9 i 
X D / Z x 






NCY015721 


o x 








NCY015865 


Q o 

oz 


X DZ / U 


5 




NCY0162 70 


ft T 


16886 

X W U W v 


5 




NCY016886 


84 


18500 


5 




NCY018500 


85 


18503 


5 




NCY018503 


86 


19672 


5 




RRRPL34 


87 


15086 


4 




XLRPL1AR 


88 


15113 


4 




HUMIFNWRS 


89 


15242 


4 




NCY015242 


90 


15249 


4 




NCY015249 


91 


15377 


4 




NCY015377 


92 


15407 


4 




NCY015407 


93 


15473 


4 




NCY015473 


94 


15588 


4 




HSRPS12 


95 


15684 


4 




HSEF1G 


96 


15782 


4 




NCY015782 


97 


15916 


4 




HSRPS18 


98 


15930 


4 




NCY015930 


99 


16108 


4 




NCY016108 


100 


16133 


4 




NCY016133 



V 



R 
R 



R 
F 



descriptor 

NADH-ubiq oxidoreductase 

INCYTE 015795 

INCYTE 016245 

INCYTE 018262 

Riboptn L17 

Riboptn LI 

Actin, beta 

INCYTE 015245 

INCYTE 015288 

G-3-PD 

Laminin receptor, 54kDa 
Uracil DNA glycosylase 
INCYTE 016646 
Plsmnogen activ gene 
Ubiquit in 
Riboptn S8 
INCYTE 015295 
Riboptn S10 

UDP-galactose epimerase 
Apolipoptn J 
Tubulin, beta 
INCYTE 018218 
Hydrophobic ptn p27 
INCYTE 018963 
INCYTE 018997 
Galactosidase A, alpha 
INCYTE 015475 
015721 
015865 
016270 
016886 
018500 
018503 
L34 
Riboptn Lla 
tRNA synthetase, trp 
INCYTE 015242 
INCYTE 015249 
INCYTE 015377 
INCYTE 015407 
INCYTE 015473 
Riboptn S12 
Elf 1- gamma 
INCYTE 015782 
Riboptn S18 
INCYTE 015930 
INCYTE 016108 
INCYTE 016133 



INCYTE 
INCYTE 
INCYTE 
INCYTE 
INCYTE 
INCYTE 
Riboptn 
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TABLE 4 



Libraries: THP-1 

Subtracting: HMC 

Sorted by ABUNDANCE 

Total clones analyzed: 7375 



1057 genes, for a total of 2151 clones 



number 


entry 


s descriptor 


bgf req 


r f end 


ratio 


10022 


HUMIL1 


IL 1-beta 


0 


131 


262.00 


10036 


HSMDNCF 


IL-8 


0 


119 


238.00 


10089 


HSLAG1CDN 


Lymphocyte act iv gene 


0 


71 


142.00 


10060 


HUMTCSM 


RANTES 


0 


23 


46.000 


10003 


HUMMIP1A 


MIP-1 


3 


121 


40.333 


10689 


HSOP 


Osteopont ; n 


0 


20 


40.000 


11050 


NCY01 1050 


INCYTE 011050 


0 


17 


34.000 


10937 


HSTNFR 


TNF-a lpha 


0 


17 


34.000 


1017 6 


HSSOD 


Superoxide dismutase 


0 


14 


28.000 


10886 

^rir 


HSCDW40 


B-cell activ , NGF-relat 


0 


10 


20.000 


10186 


HUMAPR 


Early resp PMA-induc 


0 


9 


18.000 


10967 


HUMGDN 


PN-1 , glial-deriv 


0 


9 


18.000 


11353 


NCY011353 


INCYTE 011353 


0 


8 


16.000 


10298 


NCY010298 


INCYTE 010298 


0 


7 


14.000 


10215 


HUM4COLA 


Collagenase, type IV 


0 


6 


12.000 


10276 


NCY010276 


INCYTE 010276 


0 


6 


12 .000 


10488 


NCY010488 


INCYTE 010488 


0 


6 


12.000 


11138 


NCY011138 


INCYTE 011138 


0 


6 


12.000 


10037 


HUMCAPPRO 


Adenylate cyclase 


1 


10 


10.000 


10840 


HUMADCY 


Adenylate cyclase 


0 


5 


10.000 


10672 


HSCD44E 


Cel 1 adhesion glptn 


0 


5 


10.000 


12837 


HUMCYCLOX 


Cyclooxygenase-2 


0 


5 


10.000 


10001 


NCY010001 


INCYTE 010001 


0 


5 


10.000 


10005 


NCY010005 


INCYTE 010005 


0 


5 


10.000 


10294 


NCY010294 


INCYTE 010294 


0 


5 


10.000 


10297 


NCY010297 


INCYTE 010297 


0 


5 


10.000 


10403 


NCY010403 


INCYTE 010403 


0 


5 


10.000 


10699 


NCY010699 


INCYTE 010699 


0 


5 


10.000 


10966 


NCY010966 


INCYTE 010966 


0 


5 


10.000 


12092 


NCY012092 


INCYTE 012092 


0 


5 


10.000 


12549 


HSRHOB 


Oncogene rho 


0 


5 


10.000 


10691 


HUMARF1BA 


ADP-ribosylation fctr 


0 


4 


8.000 


12106 


HSADSS 


Adenylosuccinate synthetase 


0 


4 


8.000 


10194 


HSCATHL 


Cathepsin L 


0 


4 


8.000 


10479 


CLMCYCA 


I Cyclin A 


0 




8.000 


10031 


NCY010031 


INCYTE 010031 


0 


4 


8.000 


10203 


NCY010203 


INCYTE 010203 


0 


4 


8.000 


10288 


NCY010288 


INCYTE 010288 


0 


4 


8.000 


10372 


NCY010372 


INCYTE 010372 


0 


4 


8.000 


10471 


NCY010471 


INCYTE 010471 


0 


4 


8.000 


10484 


NCY010484 


INCYTE 010484 


0 


4 


8.000 


10859 


NCY010859 


INCYTE 010859 


0 


4 


8.000 


10890 


NCY010890 


INCYTE 010890 


0 


4 


8.000 


11511 


NCY011511 


INCYTE 011511 


0 


4 


8.000 


11868 


NCY011868 


INCYTE 011868 


0 


4 


8.000 


12820 


NCY012820 


INCYTE 012820 


0 


4 


8.000 


10133 


HSI1RAP 


IL-1 antagonist 


0 


4 


8.000 


10516 


HUMP2A 


Phosphatase , regul 2 A 


0 


4 


8.000 


11063 


HUMB94 


TNF-induc response 


0 


4 


8.000 


11140 


HSHB15RNA 


HB15 gene ; new Ig 


0 


3 


6.000 


10788 


NCY001713 


INCYTE 001713 


0 


3 


6.000 


10033 


NCY010033 


INCYTE 010033 


0 


3 


6.000 


10035 


NCY010035 


INCYTE 010035 


0 


3 


6.000 


10084 


NCY010084 


INCYTE 010084 


0 


3 


6.000 


10236 


NCY010236 


INCYTE 010236 


0 


3 


6.000 


10383 


NCY010383 


INCYTE 010383 


0 


3 


6.000 
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TABLE 4 Con't 



number 



entry 



s descriptor 



bgfreq rfend ratio 



10450 

10470 

10504 

10507 

10598 

10779 

10909 

10976 

10985 

11052 

11068 

11134 

11136 

11191 

11219 

11386 

11403 

11460 

11618 

11686 

12021 

12025 

12320 

12330 

12853 

14386 

14391 



NCY010450 

NCY010470 

NCY010504 

NCY010507 

NCY010598 

NCY010779 

NCY010909 

NCY010976 

NCY010985 

NCY011052 

NCY011068 

NCY011134 

NCY011136 

NCY011191 

NCY011219 

NCY011386 

NCY011403 

NCY011460 

NCY011618 

NCY011686 

NCY012021 

NCY012025 

NCY012320 

NCY012330 

NCY012853 

NCY014386 

NCY014391 



INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 

INCYTE 



010450 

010470 

010504 

010507 

010598 

010779 

010909 

010976 

010985 

011052 

011068 

011134 

011136 

011191 

011219 

011386 

011403 

011460 

011618 

011686 

012021 

012025 

012320 

012330 

012853 

014386 

014391 



o 


3 


o 


3 


o 


3 


0 


3 


o 


3 


o 


3 


o 


3 


u 




0 


3 


0 


3 


0 


3 


0 


3 


0 


3 


0 


3 


0 


3 


0 


3 


0 


3 


0 


3 


0 


3 


0 


3 


0 


3 


0 


3 


0 


3 


0 


3 


0 


3 


0 


3 


0 


3 



6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6 .000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
6.000 
000 
000 
000 
000 
000 
000 
000 
000 



6 
6 
6 
6 
6 
6 
6 
6 
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TABLE 5 



* Master menu for SUBTRACTION output 
SET TALK OFF 

S3T SAFETY OTP 
SET EXACT ON 
SET TYPEAKEAD TO 0 
CLEAR 

SET DEVICE TO SCREEN 

USE'"SinartGuy:FoxBASS+/>IaCifox files; Clones. dbff 
GO TOP " 

STOHS NUKBKK TO INITIATE 
GO BOTTOM 

STORE NUMBER TO 'TERKTMATE 
STORE 1 ' T° Target 1 

STORE ' 1 TO Taryet2 

STORE 1 ' TO Targe C 3 

STORE 1 1 TO objectl 

STORE' ' ' TO Cbject2 

STORK * * T° 0bject3 

STORE 0 TO ANAL 
STORE 0 TO E34ATCH 
STORE 0 TO HMATCH 
STORE 0 TO CMATCH 
STORE 0 TO XMATCH 
STORE 0 TO .PTF 
STORE i TO BAIL 
CO WHILE .T, ' 

* Xerogram, i 'Subtraction 2. trot 

'* Date.... t .10/11/94 . . .. 

* Version* i Fo*BAS£+/Hac, revision 1-10 

* Notes....* Fornet file Subtraction 2 



SCREEN 1 TYPE 0 HEADING Screen 1' AT 40,2 SIZE 286 492 PIXELS FCOT 'Geneva-, 9 COLOR 0,0,0, 
f^^ro kl 5Sl7fl 241 STVLS 3871 COLOR 0,0,-1,24610,-1,8947 

2 SSe 25'm2 ^ -fiibcracSSS Menu" « 5 S36 FONT 'Geneva*, 274 COLOR 0, 0,-1,-1, ;1,-1 

2 pSSI 117 MC^BT S£S ctSe 65536 FOIT 'ChicagoM2 PICTURE -J'O Exact ' SIZE 15, 62 CO 
f'MKl'WsIwI GOT MMW 'STO 65536 FONT "Chicago;, 13 .FIOT*B ;£C ^^'j*™^ 
| S itl SET QKATCK STVLE 65536 FONT 'Chicago M 2 PICTURE ■S'C Other epe" SIZE 15,84 

t nmJ 90 152 ^-^eSeTTSna 65536 FCNtf -Geneva M2 C0WR 0 0 r 1,-1, -1. -1 
I ISSS i^i 126 GST Imatch STYLE 55536 -Chicago M2 PICTURE «$*C Xncyte- SIZE 15,65 CO 

§ w'lT7 initiate STYLE 0 FOOT 'Oewi',12 SIZE 15,70 COEOR 0,0,-1,-1,-1,-1 

fl lll'lll get Senate OT2 0 FOOT W f 12 SIZE 15,70 COLOR 0< °>^> ^< 1" 

I pS^S 252 35 »Y-lSS^ ol««- h STYLE 65536 FONT 'Geneva', 12 COI/DR 0, 0,-1, -1, -1, -1 
I ill 215 SAY •->■ STOB 65536 FONT 'Geneva*, 14 COLOR 0,0,-1,-1,-1,-1 • • 

I Ssol iHlue «T PW OT*i* «536 FONT ■ Chicago * , 12 PICTURE .^C .Print co file" 6K» 15,9 
fl-MSM 90,9 TO 191,109 STYLE 3871 COU>R 0,0,-1,-25600.-1,-1 
I STXELS 90 28'8 TO'131,397 STYLE 3871 COLOR 0,0,-1,-25600,-1,-1 

* 2?* S12 -Background* - STYLE 65536 FONT -Geneva"^ COLOR 0,0,-1.-1,-1,-1 
a ttvw? fii l?fi PAY "Taraefc:" STYLE 65536 FONT •Geneva* . 270 COLOR °'2'" i ' x : i i A w , 

! io^o^t SHSEi sm^ o rem -o^'^ «« ".79 color g'g't^'lMl 

1 iSii^aSr oMectl^LE 0 raw^«va*.,9 SIZE 12.79COWR 0.0,-1.-1.-1.-1 

I IShS 135 299 GET Sbje«2 STOLE 0. FONT "Geneva-, 9 SIZS 12,79 COWR 0.0 -1,-1.-1.-1 

* |2St i«l 299 GET 0b5e«3 SWLE 0 FONT 'Geneva*. 9 SIZE 12,79 COLOR 0,0,-1,-1,-1.-1 

* PIXELS 276'324W £1^7^ 65336 FOOT -Chicago", 12 PICTURE <9*R Run;8aiL out- SXZ3 4112 



« EOF: Subtraction . 2 . feit 

HEAD 

IP Bailc2 

CLEAR 

CLOSE DATABABE9 £ ... .^ aq - 

USE ■awrtGuy:Fo*BASE+ /Macs toe f Uee : clones, dbf 

.SET SAFET? ON 
SCREEN. 1 OFF 
RETURN 
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ENDIF 

5 TORS VATi(SYS(2) ) TO STARTIME 

STORE UPPER ( Target! ) TO Target 1 

STORE UPPER (Target 2) TO Targe t2 

STORE UPPER { Target3 ) TO Targe t3 

STORE UPPER(Objactl) TO Objectl 

STORE UPPER (Ob j ect2 J TO • Ob j ect2 

STORE UPPERfObject3) TO Object3 

clear 

SET TALK ON 

GAP TERMINATE~XNITIATE+1 

GO INITIATE 

COPY NEXT GAP FIELDS NUM3ER, library, D, F, Z, R, ENTRY, S, DESCRIPTOR, START, RFEND, I TO TEMPNUM 

USB TEMPNUM 
COUNT TO TOT 

COPY TO TEMPRED FOR D= 1 B' . OR.D* '0' .OR.Ds'H' .0R.D='N' . OR . D» 1 1 1 
USE TEMPRED 

IF EinatChiO « AND* Kmatch=Q -AND. Cmatch=0 .AMD. IMATCH=0 

COPY TO TEMPDESIG 

EXtSE 

COPY STRUCTURE TO TEMPDESIG 
USE TEMPDESIG 
IF Ercatch»l 

APPEND FROM TEMPNUM FOR D-'E l 
ENDIF 

IP ' Birjatchs:l 

APPEND FROM TEMPNUM FOR D=*K' 
SNDTF 

IF Ctttatch^l 

APPEND FROM,' TEMPNUM FOR D*'0' 
ENDIF 

IF tiratchsl 

APPEND FROM TEMPNUM FOR D= ' I ' .OR. D= ' X ' 
VOR^'N' 
, ENDIF 

ENDIF 

COUNT TO STARTOT 

COPY STRUCTURE TO TEMP LIB 

use temfltb 

APPEND FROM TEMPDESIG FOR librarynUFPER C target 1} 
IP targetSo' . 1 

APPEND FROM TEMPDESIG FOR library=UP?ER (target 2 ) 
ENDIF 

IF target3<>' ' 

APPEND FROM TEMPDESIG FOR Iihrary»UPPER (target 3 ) 

EOT IF 
COUNT TO ANADTOT 

USE TEMPDESIG 

COFY STRUCTURE TO TEMPSUB 

USE TEMP SUB 

APPEND FROM TEMPDESIG FOK Iibraxy=UPPER(0bjectl) 
IF targefc2<>' ' ' . 

APPEND FROM TEMPDESIG FOR. Hbrary=UFFER (Object 2 > 

ENDIF 

IF ta£set3<> t 

-APPEND FROM TEMPDESIG FOR library*UP?ER(Cbject3) 

ENDIP 

COUNT TO SUBTRACTOT 
SETT TAX«K OFF 

+ COMPRESSION SUBROUTINE A 
? 'COMPRESSING' QUERY LIBRARY 1 
USE TEMPDIB 
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SORT 'CN' EOTRY, NUMBER 10 LlBSORT 

USE LlBSORT 

COUNT TO IDGEKE 

REPLACE ALL RFEND WITH 1 

MXRKl « 1 

swa«o 

00 WHILE SW2-0 ROLL 
IF MAHK1 >- IDGENE 
PACK 

COUNT TO AUNIQUE 

SW2=sX 

LCOP 

END IF 
GO MARK1 
DUP " 1 

STORE ENTRVf TO TESTA 
STORE D TO DESIGA . 
» 0 

DO WHILE SW-0 .TEST 
SKIP 

STORE EtTTRY TO TESTE 
STORE D TO 02SI0B 

IF TESTA « TESTB . AMD . D3 S ZGA»DES IGB 

DELETE 

DUP s DUP+I 

UOOP 
ENDIF 
GO'MARKl 

REPLACE RFSND WITH OTP 
MARKl * MARXl+ItfP 
SWsl 
LOOP 

ENDOO.TEST 
LOOP 

^T°QN RFFND/D, NUMBER TO TEMPT ARSORT 

^S^f^ART WITK WM)/ICCEME*100OO 
COCWI TO T2MPTARC0 



* 



COMPRESSION SUBROUTINE B 
? 'COMPRESSING TARGET LIBRARY' 
USE TEMPSU3 

SORT ON ENTRY, NUMBER TO'SUBSORT 

USE SUBSORT 

COUNT TO SU3GENE 

REPLACE ALL RFHJD WITH 1 

MARK1 « 1 

SW2**0 

DO WHILE ROLL 
IP KARKl >- SUEGENE 
PACK ■ 

COUNT TO BUNIQUE 
SW2sl 

LOOP 
ENDIF 
GO HARKl 

DUP m 1 

STORE ENTRY TO TESTA 
STORE' D TO D3SIGA 
SW * 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO. TESTB 
STORE D TO DESIGB 

IF TESTA ~ TESTB. ftND.DSSIGAsDESIGB 



50 



WO 95/2068 1 



PCT/US95/01160 



DZLBTS 
DUP « DUP+1 
LOOP 
END IF 
G0MARK1 

REPLACE RFEND WITH DUP 
MARXl = MARXL+DUP 

LOOP 

EHDDO TEST 
JjOOP : 
ENDDO ROLL 

SORT CN RF3ND/D, NUMBER TO TEKPSU3S0RT 
■USE TEMFSU3S0RT 

* REPLACE AIL START WITH RFEND/IQSENE* 10000 
COUNT TO TEMPSUECO 

* *** * *#*«»»#**•*♦»* #** ********************++****•******■•***** + ** A 

* FUSION ROUTINE 

? 'SUBTRACTING LIBRARIES' 

USB SUBTRACTION 

COPY STRUCTURE TO CRUNCHER 

SELECT 2 

USB TEMFSUBSORT 

SELECT 1 

USB CRUNCHER 

APPEND PROM TEMPTARSORT 

COUNT TO BAILOUT 

*&SX * 0 

t 

DO WHILE 
SELECT 1 
MARK = MARK+1 

IF MARK>BAILOUT 

EXIT 

ET3DIP 
GO MARK 

STORE ' ENTRY TO SCANNER 
SELECT 2 

LOCATE ?OR ENTRY-SCAKNER 
IP FOUND () 
STORE RFEND TO BIT1 
STORE RFEND TO BIT2 

STORE 1/2 TO BITl 
STORE 0 TO BIT2 
ENDIF 
SELECT X 

REPLACE B3FREQ WITH BIT2 
REPLACE ACTUAL WITH BITl 
LOOP 
SCCO 

SELECT 1 

REPLACE ALL RATIO WITH RFEND/ ACTUAL 

? 'DOING FINAL SORT BY RATIO' 

SORT CW t RATIO /D , BGFREQ/ D , DESCRIPTOR TO FINAL 

use rrviAL 
Bet talk off 

DO CASE. 

CASE PTF-O' 

SET DSVTCE TO PRINT 

6 ST PRINT ON 

EJECT 

CASE FTF=1 

SET ALTERNATE TO "Adenoid -Patent Figures : Subtraction. txt* 
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SET ALTERNATE CN 
ZNDCASE 

STORE VAL(SYS(2) )' TO FIOTIKE 

IF FIKTXME<STARTTME 

STORE FINTXME+86400 TO .FHiTlME 

ENDIF 

STORE FINTIME - START IKS TO 03M7SEC 
STORE COKPSEC/60 TO COMFMIN 

SET MARGIN TO 10 

fll,l SAY f Library Subtraction Analysis" STYLE 65536 FONT B Gensva\274 COLOR 0,0,0,-1,-1, 



7 
? 

7 datfc() 
?7 1 

77 1TMS() 

? 'Clone numfcers ' 
?7'-STH(ETTTIATE,5,0) 
.?? ' through ' • 

?? st^(ter>exkate;, 6,0) 

7 ' Libraries i • 

7 Targe tl 

IF Target2<>' 

« *■ t 

77 Target2 
ENDIF 

IF Target3<>' 
» < 

7? Target3 
ENDX? 

7 'Subtracting: 
7 Objectl 
XF-Object2<^' 
77- 1 , ' 
77 Qbject2 
ENDIF" 

IF Object3<>' 
7? ', 1 
77 0bject3 
ENDIF . 

7 'Designations .' 

IF Eraatch=0 .AND. Hmatch-Q .AND. Omatch-0 .AND. IMATCK=C 
?? 'All' 
ENDIF „ 
IF Snatch 
?? 'Exact, ' 

ENDIF 

IP Hmatchsl 
7? 'Human, • 

ENDIF 

'IF QmAtch=l 
7? 'Other ap» 1 
ENDIF 

IF Imatchwl 

77 1 INCYTE ' 
ENDIF 
•I? AUALrl 

? 'Sorted by ABUNDANCE'- 
ENDIF • 
IF ANAL-2 

? 'Arranged by function 1 
END!? 
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? 'Total clones represented: * 

?? 5TR<Wr,5,0) 

? 'Total clones analyzed: ' 

?? STR { STARTOT, 5,0) 

? 'Total, cornputation- times 

,77 STR<COMPMIN,5,2) 
?? 1 minutea' ' 

? 

? 'd - designation f = distribution z = location r = function a = species i = inte 

?' 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,452 PIXELS FONT "Geneva", 9 COLOR 0,0,0, 

DO CASE . 

CASE ANAL«1 

?? STR<AUNIQUE,4,0) 

77 ' genes, for a total o£ ' 
. ?? STR(ANfcLT0T,4,0) 
?? 1 clones' 

*> ■ . , 

SCREEN 1 TYPE 0 HEADIN3 'Screen 1* AT 40,2 SIZE 286,492 PIXELS FCNT 'Geneva", 7 COLOR 0,0,0, 

list OFF fields number, D,T, Z,R; ENTRY, S,D£SCRIPTOR,EGFR£Q, RFEND, RATIO, I 

SET PRINT OFF 

CLOSE DATABASES , 

■ USE "SirartGuy :FoxBASE+/Mac:fox files: clones, cot' 

CASE.ANAL-3 
** arrange/ function 

SET PRINT' ON 
SET HEADING ON 

SCREEN 1 TYPE. 0 HEADING 'Screen I'.AT 40,2 SIZE 286,492 PIXELS 'FONT 'Helvetica' , 268 COUDR 0 
? 

• ? t BINDING PROTEINS' 

SCREE3M' 1 TYPE 0 HEAPING "Screen If ' AT 40,2 SIZE 286,492 PIXELS FONT 'Helvetica' , 265 COLOR 0 
? 'Surface molecules and receptors: ' • ' A A 

'SCREEN 1 TYPE 0 HEADING 'Screen 1* AT 40,2 SIZE 286,492 PIXELS FONT 'Gen«vx\7 COLOR 0,0,0, 
list OFF fields number, D,F,Z,R, ENTRY, S,lXSCRIPTOR,BGFR3Q, RFEND,RATIO, I FOR R-'B' 

• SCREEN -1 TYPE 0 HEADING "Screen 1* *AT 40,2 SIZE 286,4.92 PIXELS ,FONT .'Helvetica' , 265 COLOR 0 
? 'Calcium-binding proteins:' 

SCREEN 1 TYPE 0 HEADING 'Screen 1' AT 40,2 SIZE 286,432 PIXELS FONT •Geneva*, 7 COLOR 0,0,0, 
list OFF fields number, D,F,Z,R, ENTRY, S,DESCRIPTOR,BGFREQ, RFEND, RATIO,! FOR Rs'C 

SCREEN* 1 TYPE 0 HEADING "Screen 1' AT 40,2 SISE 286,492 PIXELS FONT 'Helvetica' , 265 COLOR 0 
? 'Ligands 'and effectors:! 

SCREEN 1 TYPE 0 HEADING 'Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT "Geneva* ,7 COLOR 0,0,0, 
list OFF fields nuxrteer t D.,F,Z,R, ENTRY, S, DESCRIPTOR, BGFREQ, RFEND*RATIO,I FOR R«'S' 

SCREEN 1 TYPE 0 HEADING 'Screen 1* AT 40,2 SIZE 286,492 PIXELS FONT -Helvetica' , 265 COLOR 0 
?* 'Other binding proteins! 1 „ „ 

SCREEN 1 TYPE 0 HEADING 'Screen 1" AT'40,2 SIZE 286,492 PIXELS FC3ST "Geneva*, 7 COLOR 0,0,0, 
list OFF fields number, r>,F,Z,R f ENTRY, S, DESCRIPTOR, EGFR3Q, RFEND, RATIO, I FOR R= ' I ' * 

7 • 

SCREEN 1 TYPE 0 HEADING 'Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT 'Helvetica' ,268 COLOR 0 
? « ONCOGENES' 

SCREEN 1 TYPE 6 HEADING 'Screen 1' AT 40,2 SIZE 206,492 PIXELS FONT ' 'Helvetica' , 265 COLOR 0 
7 'General oncogenes i ' . ^ ^ . . 

SCREEN 1 TYPE 0 HEADING 'Screen 1* AT .40, 2 SIZE 286,492 PIXELS • FONT "Geneva*, 7 COLOR 0,0,0, 
list OFF fields number, D» F ( Z,R, ENTRY, S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R^O* 

SCREEN 1 TYPE 0 HEADING 'Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT 'Helvetica' ,265 COLOR 0 
? 'GTP^binding proteins i 1 

SCREEN 1 TYPE 0 HEADING 'Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT 'Geneva', 7 COLOR 0,0,0, 
list OFF fields number, D, F,2,R, ENTRY, S, DESCRIPTOR, BSFREQ, RFEND, RATIO,! FOR R='G' 
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SCREEN 1 TYPE 0 HEADING "Screen 1* AT 40,2 SISE 286,492 PIXELS FONT "Helvetica 265 COLOR 0 
? 'Viral elements;' p.-- y -.^ -jo 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT. 40,2 SIZE 236,495 PIXELS PONT •Gfe&Sa* , 7 CCCDR "0 ?C , 0 , 
list OFF fields number, D,F, Z, R, ENTRY* S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R~'V' 

SCREEN 1 TYFE 0 HEADING "Screen 1* AT 40,2 SIZE 236,492 PIXELS FONT •Helvetica" , 255 COLOR 0 
?■ 'Xinases and Phosphatases t ' • 

SCREEN 1 TYPE 0 HEADING •Screen 1* AT 40,2 SIZE 336,432 PIXELS FOOT *Geneva\7 COLOR 0,0,0, 
list OFF fields number, D, F, Z, REENTRY, S, DESCRIPTOR, BGFRSQ, RFEND, RATIO, I FOR Rs'V 

SCREEN 1-TVPE 0 HEADING "Screen 1* AT 40,2 SIZE 286,452 PIXELS FCNT •Helvetica* , 265 COLOK 0 
? "Tumor-r elated antigens i 1 

SCREEN 1 TYPE 0 HEADING "Screen 1* AT 40,2 SIZE 286,492 PIXELS FONT -Geneva*, 7 COLOR 0,0,0, 
list OFF 'fields number, D, F, Z, R, ENTRY, S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R='A' 

?. 

SCREEN 1 TYPE 0 HEADUCO 'Screen 1" AT 40,2 SI2E 266,492 PIXELS FONT "Helvetica - , 268 COLOR 0 

7 ' PROTEIN SYNTHETIC MAOOTERY PROTEINS' 

? . 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 236,492 PIXELS FCNT "Helvetica" , 265 COLOR 0 
? 'Transcription And Nucleic Acid-binding proteins t' 

SCREEN 1 TYPE 0 HEADING "Screen 1* AT 40,2 SIZE 286,492 PIXELS FONT "Geneva",? COLOR O,0,0, 
list OFF fields number, D, F, Z, R,EOTRx, S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R='D' 

t , 

SCREEN 1 TYPE 0 HEROINS "Screen 1* AT 40,2 SIZE 286,492 PIXELS FONT 'Helvetica' , 265 COLOR, 0 
? 'Translation: 1 *■ 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SI22 286,492 PIXELS FONT •Geneva', 7 COLOR 0,0,0,' 
list OFF 'fields nurtter » D, F, Z, REENTRY, S, DESCRIPTOR, SGFRSQ, RFEND, RATIO, I FOR R=»T' 

SCREEN l TYPE 0 HEADING 'Screen 1" AT' 40,2 SIZE 256,492 PIXELS FONT 'Helvetica ■ ,265 COLOR 0 
? 'Ribosamal protains: • 

SCREEN 1 TY?E 0 HEADING 'Screen 1" AT 40,2 SISE 286,492 PIXELS FONT "Geneva", 7 COLOR 0,0,0, 
list OFF fields nwfcer, D>F, Z,R, ENTRY, S, DESCRIPTOR, BGFR3Q , RFEND , RATIO, I FOR R» T R ' 

1 4 

SCREEN 1 TYPE 0 HEADING 'Screen V AT 40,2 SIZE 286, 492 PIXELS FONT "Helvetica* ,265 COLOR 0 
? •Protein processing; 1 , 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Geneva*,? COLOR 0,0,0, 
list OFF fields iRanber,D,F,Z,R,E2TOY,S,nESC^ FOR R**L' 

•y 

SCREEN 1 TYPE 0 HEADING 'Screen 1* AT 40..2 SIZE 286,492 PIXELS. FOOT "Helvetica' ,268 COLOR 0 
"? 

? * ENZYMES 1 
? 

SCREEN 1 TYPE 0 HEADING 'Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT •Helvetica' ,265 COLOR 0 
?• *Ferr ©proteins t ' 

SCREEN 1 TYPE 0 HEADING » Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT *Gexieva' ( 7 COLOR 0,0,0, 
liat OFF fields ttuftiber,D,F ( Z,R, ENTRY, FOR R=*F f 

SCREEN 1 TYPE 0 HEADING ■Screen- 1 • AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica ",265 COLOR 0 
? 1 Proteases end inhibitors:' ' 
SCREEN 1 TYFL 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Geneva 1 ,? COLOR 0,0,0, 
list OFF fields number ,'D,P,Z, R, ENTRY, S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R=*'P' 

SCREEN 1 TraJ 0 HEADING 'Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica* , 265 COLOR 0 
? 'Oxidative phospnorylaticni ,, ' 

SCREEN 1 TYFE 0 HEADING 'Screen 1* AT 40,2 SIZE 286,492 PIXELS FONT "Geneva 1 , 7 COLOR 0,0/0, 
list OFF fields number, D,F,Z,R, ENTRY, S, DESCRIPTOR, 3GFREQ, RFEND, RATIO, I FOR R='Z' 

SCREEN 1 TYPE 0 HEADING "Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica* , 265 COLOR 0 
7 'Sugar metabolism: ' ' 

SCREEN 1 TY?F 0 HEADING "Screen 1* AT 40,2 SIZE 256,492 PIXELS FONT "Geneva*, 7 COLOR 0,0,0, 
list OFF fields number , D, F , Z, R , ENTRY , S , DESCRIPTOR , BGFREQ , RFEND , RATIO, I FOR Rs'Q* 

GCRESNl TYPE 0 HEADDKG 'Screen 1 B AT 40,2 SIZE 286,492 PIXELS FONT -Helvetica* ,265 COLOR 0 
7 'Amino acid metabolism: ' 

SCREEN 1 TYPE 0 HEADING 'Screen 1* AT 40,2 SIZE 236,492 PIXELS FONT -Geneva', 7 COLOR 0,0,0,' 
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list OFF fields nUttber,D,F,Z,R, ENTRY, S, DESCRIPTOR, BOFREQ, RFSND, RATION FOR R= ' M 1 

SCREEN 1 TYPE 0. KEADIM3 "Screen 1" AT 40,3 SIZE 206,452 PIXELS FONT "^?v3if icl* ,9ls (?o2oR 0 
? 'Nucleic acid metabolism: 

SCREEN l.TYPE O'HEADIKG "Screen '1* AT 40,2 SIZE 286,492 PIXELS FONT "Geneva " , 7 COLOR 0,0,0, 
list, OFF fields number, D, F, Z»R, ENTRY, S, DESCRIPTOR, BGFREQ, RFEND, RATIO, ! FOR Re'N' 

'SCREEN '1 TYPE 0 HEADING "Screen i a AT 40,2 SIZE 286,492 PIXELS' FCOT "Helvetica", 265 COLOR 0 
? 'Lipid metabolism: ' • 

SCREEN 1 TYPE 0 BEADING "Screen 1" AT 40,2 SIZE 285,492 PIXELS FCNT ■Geneva", 7 COLOR 0,0,0, 
list OFF fields number, D,F, Z, R, ENTRY, S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R='W 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 206,492 PIXELS FCOT "Helvetica" , 265 COLOR 0 
? 'Other enzymes: 1 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SI2E 286,492 PIXELS FCOT •Geneva", 7 COLOR 0,0,0, 
lidt OFF fields number,D,F,Z,R,ENTRY,S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R='E' 

? . . . . 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SI23 236,492 PIXELS FONT 'Helvetica" , 268 COLOR 0 

? ' 

? ■ MISCELLANEOUS CATEGORIES' 



SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica" , 265 COLOR 0 
? 'Screes response: 1 

SCREEN 1 TYPE 0 HEADING 'Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Geneva", 7 COLOR 0,0,0, 
list OFF fields number, D, F; Z,R, ENTRY, S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR K*'H' 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica" , 265 COLOR '0 

? 'Structural: ' 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Geneva*,? COLOR 0,0,0, 
list OFF fields number, D, F, Z,R, ENTRY, S, DESCRIPTOR, BGFREQ, RFEND, FATTO, I '.FOR R-'K* 

SCREEN 1 TYPE 0 HEADING 'Screen 1* AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica" . 265 COLOR -0 
P , i Q£j^g£ clones : * 

SCREEN 1 TYPE 0* HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS ' FONT' "Geneva", 7. COLOR 6,0.0 
list OFF fields number, D, F, Z,R, ENTRY, S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R='X' 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FCOT "Helvetica* , 265 COLOR 0 
? 'Clones'oC unknown function; ' . 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FCOT "Geneva", 7 COLOR 0,0,0, 

list OFF fields nurrtber, D, F, Z, R, ENTRY, S, DESCRIPTOR, BGFREQ, RF2MD, RATIO, I FOR R«'U' 

ENDCASE 

DO 'Teat print .pry" 

SET PRINT OFF 

SET DEVICE TO SCREEN 

CLOSE DATABASES 

ERASE TEMPLIB * DBF 

ERASE TEMPNUM.DBF 

ERASE TEMPDESIG ♦ DBF 

SET MARGIN TO 0 

CLEAR 

LOOP 

SUDDO 
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♦Northern (single) , version 11-25-54 

clo aa database's 

SET TALK OFF 

SET PRINT OFF ' 

SET EXACT OFF 

CL2AR ' 

STORE ' TO Ecbjeot 

STORE • 1 TO Dobject 

STORE 0 TO NumS. 
STORE 0 -TO Zog 
STORE 1 TO Bail 
EC WHILE .T. 

* Program.! Northern (single) , fmt 

* Data. j 8/ 8/S4 

* Version.; .FoxBA5E+/Kac, ' revision 1.10 

* Notes. v...: 'Format file Northern (single) 

SCREEN 1 TYP2 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT •Geneva* ,12 COLOR '0,0,0 
G PIXELS 15,31 TO 46,397 STYLE 26447 COLOR 0,0,-1,-23600,-1,-1 
0 PIXELS 89,79 TO 152,422 STYLE 28447 COLOR 0,0,0,-25600,-1,-1 

& PIXELS US, 98 SAY 'Entry #t» STYLE 65536 3TONT ■Oer.«vaM2 COLOR 0,0,0,-1,-1,-1 

@ PIXELS 115,173 GET Eobject STYLE 0 TCm "Geneva* , 12 SIZE 15,142 COLOR 0,0,0,-1,-1,-1 

8 PIXELS 145,89 BAY 'Description « STYLE 65536 FONT "Geneva M2 COLOR 0,0,0,-1,-1,-1 . 

6 PIXELS 145,173 GST Dobject STYLE 0 FOOT 'Geneva', 12 SI 22 15,241 COLOR 0,0,0,-1,-1,-1 

$ PIXELS 35,89 SAY "Single Northern search screen* STYLE 65536 FONT "Geneva 1 ", 274 COLOR 0,0,- 

6 PIXELS 220,162 GST Bail STYLE 65536 FONT "Chicago M2 PICTURE B «*R Continue; Bail out* SIZE 

* PIXELS 175,98 SAY "Clone STYLE 65536 FONT 'Geneva M2 COLOR 0,0,0,-1,-1,-1 

9 PIXELS 175,173 GET Numb STYLE 0 FONT "Geneva".^ SI2E 15, 70 COLOR 0, 0, 0 , -1, -1', -1 

6 PIXELS 80,152 SAY 'Enter ar,y OWE of the following^ STYLE 65536 FONT "Geneva 1 , 12 COLOR -1', 

♦'EOF: Northern (single) .fmt 
READ 

IF Eail-2 
CLEAR • 
screen 1 off 
"RETURN 
ENDIF 

USE "SmartGuy;FoxBASE+/Mac:Fcx files i Lookup * dbf * 
SET TALK' ON 

IF Eobjecto' 

STORE UPPER (Eobject) to Eobject 
SET SAFETY OFF 

SORT .O N En try to "Lookup entry . dbf " 

SET SAFETY ON 

USE * Lookup entry, dbf * 

LOCATE FOR Look=Eobject 

'IF .,N0T.FOUND(} ' 

CLKAR 

LOOP 

BRCW3E 

STORE Entry TO Searchval • 

CLOSE DATABASES 

ERASE 'Lookup entry, dbf 

QJDIF 

-IF-Dobjecto* ■ 
SET EXACT OFF 
SET SAFETY OFF 

SORT ON de scriptor TO 'Lookup descriptor, dbf " 
SET SAFETY On 

USE "Lookup descriptor, dbf * 

LOCATE FOR UPPER (TRIM (descriptor ) ) =; UPPER (TRIM (Dobj eet ) ) ■ 

IF ♦ HOT. FOUND O 
CLEAR 



5 6 



WO 95/20681 



PCT7US95/0n60 



LOOP 

ENDXF 

BROWSE 

STORE Entry TO Se&rchval 

CLOSE DATABASES 

ERASE "Lookup descriptor.dbf " 

SET EXACT Cfcf 

ENDIF ■ 

IF NuroboO 

USS •£martOuyiFoxBAES+/ f >Iac:Fox files ; clones .db£ " 

GO NUmb 
BROWSE 

.STORE Entry TO Searchval 
EMDXF 

CLEAR 

? 'Northern analysis for entry ' 

?? Se&fchval 

? 

? 'Eater Y to proceed' 

WAIT TO OK • 

CLEAR 

IF UPPER (dOo'V 
screen 1 off 
RETURN 
SNDIF 

* COMPRESSION ' SUBROUTINE FOR Library f dbf 

7 •Cororessing the Libraries file now.' . . 1 . 

USS -SciartOuy:Fox3ASSl+/Mac:Fox files: libraries. dbf* 

SET SAFETY OFF • 

SORT ON library TO 'Cranpreased libraries. abf" 

* FOR ente red>0 ' 
SET SAFETY ON 

USS "Compressed libraries. db£ - 

DETjSTS FOR enterooVO 

PACK 

COUNT TO TOT 
MARK! * 1 

DO WHTIlR SW2=0 ROLL 
•IF MARK! TOT 
• PACK 
5W2=1 
LOOP 
ENDIF 
GO MARKl , 
' STORE library TO TESTA 

'SKIP 

STORE Library to teste 

IF TESTA = TESTS 

DELETE 

EMDIF 

MARKl * MARK1+1 
LOOP * 
EWDDO ROLL 

* Northern analysis 

CLEAR 

? 'Doing thw northern new... ■ 

SET TALK ON _ , ■ 

USE *emartGuyiFox3ASE+yKacrFox files i clones ,dbr"' 

SET SAFETY OFF 

COPY TO "HitB.dbf" FOR entry=searchval 
SET SAFETY CN 



WO 95/20681 



PCT/US95/01160 



* MASTER ANALYSIS 3; VERSION 12-9-94 

* Master menu for analysis output 
CLOSE DATABASES 

SET TALK OFF 
SET SAFETY OF? 
CLEAR 

SET DEVICE TO SCREEN 

SET DEFAULT TO • SmartGuy : FoxBASE+VMac : fox files :Output programs: 
USE "SmartGuy: Fox3AS£-i- /Mac: fox files : Clones ,dbf 11 

GO TOP 

STORE NUMBER TO INITIATE 
GO BOTTOM 

STORE NUMBER TO TERMINATE 
STORE 0 T . ENTIRE 
STORE 0 TO CONDEN 
STORE 0 TO ANAL 
STORE 0 TO EMATCK 
STORE 0 TO HMATCK 
STORE 0 TO OMATCK 
STORE 0 TO IMATCK 
STORE 0 TO XMATCK 
STORE 0 TO PRINTON 
STORE 0 TO PTE 
DO WHILE .T. 

* Program.: Master analysis- £mt 

* Date. . . . ; 12/ 9/94 

* Version.: FoxBASE^/Mac, revision 1.10 

* Notes* . , . * Format file Master analysis 



SCREEN 1 

5 PIXELS 
£ PIXELS 
@ PIXELS 
3 PIXELS 
@ PIXELS 
<a PIXELS 
(5 PIXELS 
Q ' PIXELS 
J? PIXELS 
13 PIXELS 
@ PIXELS 
<a PIXELS 
Q PIXELS 
9 PIXELS 

6 PIXELS 
<i PIXELS 
6 PIXELS 
<3 PIXELS 
(2 PIXELS 



'Geneva* ,9 COLOR 0,0,0, 



TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT 
39,255 TO 277,430 STYLE 28447 COLOR 0,0,-1,-25600,-1,-1 

75,120 TO 178,241 STYLE 3871 COLOR 0,0,-1,-25600,-1,-1 . 
27,98 SAY "Customized Output Menu 8 STYLE 65536 FONT " Geneva ', 2 7 4 COLOR 0,0,-1,-1,-1 
45,54 GET conden STYLE 65536 FONT "Chicago",12 PICTURE ''<a*C Condensed format" SIZE 
54,261 GET anal STYLE 65536 FONT 'Chicago", 12 PICTURE '@*RV Sort/number; Sort/entry; 



E4ATCH STYLE 6553 6 FOOT "Chicago** , 12 PICTURE 
HMATCH STYLE 65536 FONT w Chicago*,12 PICTURE 
OMATCH STYLE 6533 6 FOOT ,1 Chicago * , 12 PICTURE 
•Matches:" STYLE 65536 FONT "Geneva" , 268 COLOR 
63,54 GET PRINTON STYLE 65536 FONT 'Chicago \ 12 PICTURE m Q*C Include 
171,126 GET Imatch STYLE 65536 FONT "Chicago* , 12 PICTURE "&*C Incyte 



117,126 GET 
135,126 GET 
153,126 GET 
90,152 SAY 1 



Q*C Exact " SIZE 15,62 CO 
©*C Homologous" SIZE 15,1 
$*c other spe u size 15,84 

0,0,-1,-1,-1,-1 

clone listing" 

SI££ 15,65 CO 



initiate STYLE 0 FONT "Geneva", 12 SIZE' 15, 70 COLOR 0,0,-1,-1,-1,-1 
terminate STYLE 0 FOOT 'Geneva ",.12 SIZE 15,70 COLOR 0,0,-1,-1,-1,-1 
-include clones 1 STYLE 65536 FONT "Geneva" , 12 COLOR 0,0,-1,-1,-1.-1 
-->' STYLE 65536 FONT "Geneva- , 14 COLOR 0,0,-1,-1,-1,-1 
PT^ style S5536 FONT "Chicago" , 12 PICTURE "<2*q Print to file" SIZE 15,9 
189,0 TO 257,120 STYLE 3871 COLOR 0,0,-1,-25600,-1,-1 
209 3 SAv -Library selection" STYLE 65536 FONT "G ^neva" , 266 COLOR 0,0,-1,-1,-1/-! 
227 18 GET ENTIRE ST*L2 65536' FONT 'Chicago", 12 l-ICTURE "$*RV All; Selected* SIZE 16 



252,146 
270,146 
234,134 
270,125 
198,126 



GET 
GET 
SAY 
SAY 
GET 



* EOF: Master analysis. fmt 
READ 

IF ANAL- 9 

CLEAR 

CLOSE DATABASES 
ERASE TEMPMASTER . DBF 

USE "SrartGuy iFoxBASE+/Mac:fox f lies : clones .dbf ■ 
SET SAFETY ON 
SCREEN 1 OFF 

RETURN 
END IF 
clear 

? INITIATE 
? TERMINATE 
? CONDEN 
? ANAL 
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? ematch 
? Hmatch 
? Cmatch 
? IMATCK 
SET TALK ON 

IF ENTIR£=2 
USE "Unique libraries :db£ * 

REPLACE ALL i WITH 1 * 

BROWS£ FIELDS i , lihname , library , total , entered AT 0,0 
ENDXF 

USE "Smart:Guy:FcxBASE+/Mac;fox f iles i clones. dbf H 

*CO?Y 10 TEMPNUM FOR NUMBER>= INITIATE , AND . NUMBER <=TSRMIKATS 

*U32 TEKPNUM 

COPY STRUCTURE TO TEMPLI3 
USE TEMPLIB 
I? ENTIRE 3 1 

APPEND FROM 4 SntartGuy :FoxBASE+/Mac: fox f iles : Clones . dbf * 
ENDIF 

I? ENTIRES 
USE "Unique libraries. dbf" 

COPY TO SELECTED FOR UPPER ( i) = ' Y 1 
USE SELECTED 

STORE R2CC0UNT ( ) TO STOPIT 
MARK* 1 

DO WHILE ,T, 

IF MARK>STOPIT 

CLEAR 

EXIT 

EttDIF 

USE SELECTED 
GO MARK 

STORE library TO THISONE 
? 1 COPYING ' 
?? THISONE 
USE TEMPLIB 

APPEND FROM " Smart Guy : FoxBASE* /Mac i fox f iles : Clones . dbf " FOR library "THISONE 
STORE KARK+1 TO MARK 
LOO? 
ENDDO 
SSJQIF 

USE M SroartGuy:?oxBASE+-/Macifox f iles : clones, dbf * 

COUNT TO STARTQT 

COPY STRUCTURE TO TSMP0ESIO 

USE EEMPDESIG 

IF Ehiarch=0 .AND.. Hmatch=0 .AND. .AMD. IMATCH=0 

APPEND FROM TEMPLIB 

EMDIF 

IF Eroacch^l 

APPEND FROM TEMPLIB FOR D- ' E' 
ENTDIF 

IF Hmarchel 

APPEND FROM TZMPLI3 FOR D='K' 
ENDIF 

IF Omatch=l 

APPEND FROM TEMPLIB FOR D»'0' 
ENDIF 

IF lTtiatch-1 

A?peo FROM TEMPLIB FOR D= ' I ' . OR . D* ' X 1 . OR . D» ' N 1 
ENDIF 

IF Xrratch=l 

APPEND FROM TEMPLIB FOR D='X' 

ENDIF 
COUNT TO ANALTOT 
set talk off 
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CASE PTF=0 

SET DEVTCE TO PRINT 

SET PRINT ON 

EJECT 

CASE PTF-l 

SET ALTERNATE TO "Total function sort.fcxt 1 * 

•SET ALTERNATE TO "H and 0 function sort ,txt n 

*SET ALTERNATE TO "Shear Stress HUVEC 2 : Abundance sort.txt" 

*SET ALTERNATE TO "Shear Stress HUVEC 2 : Abundance con.t*t" 

*£E? ALTERNATE TO "Shear Stress HUVEC 2 : Function sorcnxC 

*SET ALTERNATE TO "Shear Stress HUVEC 2 : Distribution 30rn.txt H 

*$ET ALTERNATE TO "Shear stress HUVEC l;Clone Ust.txt" 

♦SET ALTERNATE TO "She£.r Stress HUVEC 2 iLocaticn sort.txt" 

SETT ALTERNATE ON 

ENDCASE 

********************* 

IF PRINTON^l 

$1,30 SAV "Database Subset Analysis' STYLE 65536 FONT •Geneva", 274 COLOR 0,0,0,-1,-1,-1 

ENDIF 

? 

? 

? dateO 
?? ' 1 
?? TIMSO 

? 1 Clone- numbers ' 
?? STR( INITIATE', 6,0} 
?? ' through ' 
?? STR (TERMINATE, 6,0) 
? 'Libraries i ' 

IP EOTIR£=1 

? 'All libraries * 

ENDIF 

if entire- 2 
mark»i 
do while ,t. 
i? mark>stopit 

EXIT 
ENDIF 

USE SELECTED 
GO MARK 
7 * 1 

?? TRTM(libname) 
STORE MARX+1 TO MARK 
LOOP 
END DO 
ENDIF 

? 'Designations : ' 

IF anatch^O .AND. Httatch=0 .AND, Cmatch^O .AMD. IM&TCH^Q 

?? 'All' 

ENDIF 

IF Ematch=l 
?? 'Exact, 1 

ENDIF 

IF H.Tatch=l 
?? 'Human, • 
ENDIF ' 
IF Qmatchsl 
?? • Other sp. 1 

ENDIF 

IF Imatch^l 
7? 'INCITE' 
ENDIF 

IF Xrratch-1 
?? 'EST 1 
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ENDIF 

IF CONDEN=l 

? 'Condensed format analysis ' 

HMD IP 

IF ANAL-1 

? 'Sorted by NUMBER* 

ENDIF 

IF ANAL=2 

? 'Sorted by ENTRY' 

ENDIF 

IF ANAL=3 

? 'Arranged by ABUNDANCE' 

ENDIF 

IF ANAL=4 

? 'Sorted toy INTEREST 1 

ENDIF 

IF ANAL=5 

? 'Arranged by LOCATION ' 
ENDIF ' 
IF ANAL* 6 

? 'Arranged by DISTRIBUTION' 

ENDIF 

IF ANAL=7 

? 'Arranged by FUNCTION 1 
ENDIF 

? "Total clones represented: ' 

?? STR<STARTOT,6,0) 

? "Total clones analyzed: ' 

?? STR(ANALTOT, 6#0]r 

? '1 = library d = designation f - distribution s = location r « function c = cer 
? 

USE TEMPDE3IG 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SI2E 286,492 PIXELS FONT "Geneva* ,7 COLOR 0,0,0, 
DO CASE 
CASS ANAIi^l 

* sort/number 

SET HEADING ON 
IF CONDEN=l 

SORT TO TEMPI ON ENTRY , NUMBER 
DO "COMPRESSION number • ?RG ' 
ELSE 

SORT TO TEMPI ON NUMBER 
USE TEMPI 

list Off fields number, L, D, F, Z,R,C, ENTRY, S, DESCRIPTOR 

*liSt Off fields number, L,D,F, Z, R,C, ENTRY, S, DESCRIPTOR, LENGTH, RFEND, INIT, I 
CLOSE DATABASES 
ERASE TEMPI . DBF 
EIOIF 

CASE ANAL=2 

* sort/DESCRIPTOR 
SET HEADING ON 

♦SORT TO TEMPI ON DESCRIPTOR, ENTRY, NUMBER/ S for D= ' E ' . OR . D= ' H 1 . OR . D^= ' Q ' . OR . D= ' X ' . OR . D= ' 1 1 
♦SORT TO TEMPI ON ENTRY, DESCRIPTOR, NUK5ER/S for D= 'E ' .OR.D- ' H' . OR. D-'O' .OR.D« 'X* , OR.D* ' I ' 
SORT TO TEMPI ON ENTRY, START/ S for D=*E» .or.Db'K 1 .OR.D='0' .OR.D= ' X' . OR. D= 1 1 1 
IF CCNDEN=1 

DO "COMPRESSION entry . PRG " 
ELSE 

USE TEMPI 

list Off field? number, L,D,F,Z,R,C, ENTRY, S, DESCRIPTOR, LENC3TH, RFEND, INIT, I 
CLOSE DATABASES 
ERASE TEMPI . DBF 
ENDIF 
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CASS ANAL=3 

* sort by abundance 
SET HEADING ON 

SORT TO TEMPI ON ENTRY , NUMBER for D='E' . OR.D-'H' .OR.D^'O* . OR, D* 1 X ' . OR . D= ' 1 1 

CO "Compression abundance . ?RG W 

CASE ANAL- 4 

* sort/interest 
SET HEADING CN 
IF CONDEN=l 

SORT TO TEMPI ON ENTRY .NUMBER FOR 3>0 
DO "COMPRESSION interest . PRG N 

SORT ON I /D, ENTRY TO TEMPI FOR X>1 
USE TEMF^ 

list o£f~fields nurrfcer ,L, D, F, Z, R,C,ENTR Y,S, DESCRIPTOR, LENGTH ,RFEND, INIT, I 
CLOSE DATA3A5ES 
ERASE TEMPI. DBF 
ENDIF 

CASE ANAL- 5 

* arrange/ location 
SET HEADING ON 
STCP3 4 TO AMPLIFIER 
? 'Nuclear* ' 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, Z,R,C, ENTRY, £ , DESCRIPTOR, LH^TH, INIT, I, COMMEN 
IF CCNDEN=1 

DO "Compression location, prg" 
ELSE 

DO '•Normal subroutine 1" 
ENDIF 

? ' Cvconla smic * ' 

SORT 'CN ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, Z . R,C, ENTRY, S , DESCRIPTOR, LENGTH, INIT, I, CQMMEN 
IF CONDEN^l 

DO "Compression location. prg" 
ELSE 

DO "Normal subroutine 1* 
ENDIF 

*? 1 CvcosJcciIg ton * ' 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D,F,Z, R,C, ENTRY, S, DESCRIPTOR, LENGTH, INI T, I, COMMEN 
IF CONDEN=l 

DO •'Compression location. prg" 

ELSE 

DO "Normal subroutine 1" 
ENDIF 

SORT^CN EOTRY?NUMBER FIELDS RFEMD, NUMBER, L,D,F,Z,R,C, ENTRY, S, DESCRIPTOR, LE^3GTK, INIT, I , CCfcMEN 
IF CCNDEN=1 

DO •Compression location. prg" 
ELSE 

CO "Norrral subroutine 1 N 
ENDIF 

SOR^^SSNUt^FI^S RFSND,NUMBER,L t DiF,ZiR,C,ENTRY,S, DESCRIPTOR, LENGTH, INIT, I*C0MMEN 
IF CCNDEN-1 

DO "Compression location, prg" 
ELSE 

DO •Normal subroutine 1" 

ENDIF 

S0^^a^S©ER FIELDS RFEND , NUMBER , L , D, F, Z, R,C, ENTRY, S , DESCRIPTOR, LENGTH/ INIT, I, COMMEN 
IF rnKmPTi=l 

DO "Compression location. prg" 
ELSE 

DO "Normal subroutine l" 
ENDIF 
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Lrt^sotry, number fields RFHTO,NU^,L,D,F.Z.^ 
IF C0ND3N=1 

DO "Compression location. prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

LrT^^Y, NUMBER FIELDS RFSND, NUMBER, D, F, 2, R, C , ENTRY , S , DESCRIPTOR* LENGTH, INIT , X , CCMMEN 
IF CONDEN=l 

DO "Compression locs.tion.prs" 
ELSE 

DO "Norm I subroutine 1' 
ENDIF 

Lr^OTOY, NUMBER FIELDS itfa^M^ER^.D^Z^ 
r? CONDEN=l 

DO "Compression location .prg" 
ELSE 

DO "Normal subroutine 1 1 
ENDIF 

IF CONDEN=l 

SET DEVICE. TO PRINTER 

SET PRINTER W 

EJECT 

DO "Output heading ,prg a 
USE "Analysis location.dbf " 
DO "Create bargraph , prg 1 

SET HEADIKG OFF «^,„ r T - frATrp >Tnf e, nYvnir ' 

? ' FUNCTIONAL CLASS TOTAL UNIQUE NEW H TOTAL 

LIST OFF FIELDS Z , NAME , CLONES , GQIES , MEW , FERCENT, GRAPH 
CLOSE DATABASES 
ERASE TEH? 2 , DBF 
SET HEADING ON 

*U5E -SmartGuy;FoxBASE+/Mac:fox files tTEMFMASTER, obf ■ 
ENDIF 

CASE ANAL=6 

* arrange/distribution 
SET HEADING ON 
STORE 3 TO AMPLIFIER 

? 'Cell/tissue specific distribution? • c*™^ nperwTwm leng^ INIT , I , COWMEN 

SORT ON ECTRY, NUMBER FIELDS RFEND , NUMBER, L , D,F,Z,R,C, ENTRY, *, DESCRIPTOR, LENG.tt, JJ*J.i * -l^v* . 

IF CONDENsl 

DO ♦'Compression distrib.prg - 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

IF CONDENsl 

DO •Compression distrib.prg- 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 'Unknown distribution: 1 T ™ r. n - r ttvttov q nF^rRiPTOR LENGTH , INIT , I , COMMEN 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D,F,Z, n,C, ENTRY, S, DESCKIPTQK, ^ in ' ilu >' ' 

IF CONDEN=l 

DO "Cotrpression distrib.prg" 
ELSE 

DO "Noxral subroutine 1" 
ENDIF 

IF CONDEN-1 

SET DEVICE TO PRINTER 

SET POINTER CN 

6 3 



WO 95/20681 



PCTAJS95/01160 



5ITSCT 

DO "Output heading, prg' 

USE "Analysis distribution.dbf 1 

DO "Create bargraph.prg" 

SET HEADING OFF 

? ' FUNCTIONAL CLASS TOTAL UNIQUE % TOTAL ' 

7 

LIST OFF FIELDS P , NAME , CLONES , GENES , PERCENT, GRAPH 
CLOSE DATABASES 
ERASE TEMP 2 , DBF 
SET HEAD IMG ON 

*U5E "SinartGuy:FoxBASE+/Mac:rox files ;TEMPMASTSR. db£ ; ' 
ENDIF 

CASE ANAL* 7 

* arrange /function 

SET HEADING ON 

STORE 10 TO AMPLIFIER 

? • BINDING PROTEINS' 

? 

? 'Surface molecules ar.d receptors 

SORT ON ECTRY/NUM3ER FIELDS RFEND , NUMBER, L f D , F, Z , R » C , ENTRY , S , DESCRIPTOR , LENGTH , INIT, I , COMMEN 
IF CONDEN-1 

DO 'Compression function. pro;" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 'Calcium- binding proteins : ■ 

SORT ON ENTRY , NUMBER FIELDS RFEND, NUMBER, L, D, F, Z> R, C , ENTRY, S , DESCRIPTOR, LENGTH, INIT, I, COMMEN 
IF CONDEN=l 

DO "Compression function.prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 'Ligar.ds and effectors) 1 

SORT ON ENTRY , NUMBER FIELDS RFEND, NUMBER, L, D, F, Z , R # C, ENTRY, S , DESCRIPTOR, LENGTH, INIT, I, COMMSN 
IF CCNDEN»2 

DO 'Compression function ,prg* 
ELSE 

DO ■Normal subroutine 1" 
ENDIF 

? 'Other binding proteins:' 

SORT ON ENTRY, NUMBER FIELDS RFEND , NUMBER , L, D, F, Z, R, C, ENTRY, S , DESCRIPTOR, LENC3TH, INIT, I, COMMEN 
IF CONDEN-1 

DO "Congress ion function. pry" 

ELSE 

DO "Normal subroutine 1" 

EtfnlF 

•EJECT 

? ' ONCOGENES 1 
? 'General oncogenes: ' 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, COMMEN 
IF CONDEN=l 

DO ■Compression rune t ion. prg* 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 'GTP-binding proteins t ' _ m 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, Z, R f C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I , COMMEN 
IF CONDEN^l 

DO '* Compress ion function, pro;" 
ELSE 

DO "Normal subroutine 1* 
ENDIF 

? 'Vi ral elements i 1 
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SORT ON ENTRY, NUMBER FIELDS RFEND , NUMBER , L , D,F, Z,R, C, ENTRY, 3, DESCRIPTOR, LENGTH , XNIT.I , COMMEN 
IF CONDEN=l 

DO "Comoression runction.prg* 
ELSE 

DO "Normal subroutine 1" 
ENDI? 

*? • Kinases and Phosphatases : ' 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, Z, R, C, ENTRY, S f DESCRIPTOR, LENGTH , INIT, I , CCMMEN 
IF COMDEX 1 

DO "Compression function .prg* 
ELSE 

DO 'Normal subroutine 1 
ENDIF 

? 'Tumor -related antigens i 1 

SORT ON ENTRY, NUMBER FIELDS RJFSND , NUM3ER , L, D, F, Z, R, C, ENTRY, 3 , DESCRIPTOR, LENGTH, INIT, I , CCMMEN 
IF CONDEN-i 

DO "Compression function .prg* 
ELSE 

DO "Normal subroutine l a 

ENDIF 

* EJECT 

? ' PROTEIN SYNTHETIC MACHINERY PROTEINS' 

? 'Transcription and Nucleic Acid-binding proteins: ' 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUKESR, L, D, F, Z, R, C ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, CCMtfEN 
IF CONDEN=l 

DO *• Compress ion function.prg' 
ELSE 

DO "Normal subroutine 1" 

ENDIF 

? 'Translation: * 

SORT ON ENTRY, NUMBER FIELDS RFEKD, NUMBER, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, COMMEN 
IF CCNDENsl 

DO "Compression function.prg" 
ELSE 

DO "Normal subroutine 1" 

ENDIF 

? 'Ribosonal proteins r' 

SORT ON ENTRY , NUMBER r I ELDS RFEND, NUMBER, L, D, F, Z t R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, COMMEN 
IF CONDEN-1 

DO "Compression function.prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? 'Protein processing:' „ „^ xn ^ 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F f 2, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I , COMMEN 

IF CONDEN=l 

DO 'Compression function.prg". 
ELSE 

DO "Normal subroutine 1" 

ENDIF 

* EJECT 

? - ENZYMES' 

* 

SORT ON ENTRY t NUMBER FIELDS RFEND, NUMBER, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, COMMEN 
IF CONDEMN 

DO "Compression function.prg" 

ELSE 

DO "Normal subroutine 1" 
ENDIF 

7 • Proteases and inhibitors : ' 

SORT ON ENTRY, NUMBER FIELDS RFEND , NUMBER, L, D, F, 2, R, C, OTRY, S, DESCRIPTOR, LENGTH, IOTT, I,CCMMQJ 

IF CONDSN«l 

DO "Compression function.prg" 
ELSE 
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DO "Normal subroutine 1" 

ENDZF 

? 'Oxidative phosphorylation: ' 

soar on eotry, number fields rfend, number,l,d,f, z,r,c, E^vr,s, descriptor. l^th,init#i ( commen 

IF CCINDSN=1 

DO "Compression funccion.prg" 
ELS2 

DO "Normal subroutine i" 
ENDIF 

? ' Sugar 'metabolism: 1 

SORT ON ENTRY, NUM3ER FIELDS RFEND, NUMBER, L,D, F , 2, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT,' I, COMMEN 
IF CQNDEN«1 

DO "Compression function, prg' 
ELSE 

DO 'Normal subroutine 1* 
ENDIF 

? 'Amino acid metabolism ; 1 

SORT ON ENTRY , NUM3ER FIELDS RFEND , NUK3ER , L, D, F, Z, R, C, ENTRY, 3, DESCRIPTOR, LElimH, INIT, I, CO&iMEN 
IF C0NDEN=1 

DO * Comprise ion function.prg* 

ELSE 

DO "Normal subroutine 1* 
ENDIF 

? 'Nucleic acid metabolism; ' 

SORT ON ENTRY , NUMBER FIELDS RFEND , NUMEER , L , D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LENGTH, IN2T, I, COMMEN 
IF CCNDEN-1 

DO 'Compression function.prg" 

ELSE 

DO J Normal subroutine 1° 
ENDIF 

? 'Lipid metabolism: 1 

SORT ON ENTRY, NUXBER FIELDS RFEND, NUMBER, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, CGMMHN 
IF CONDEN^l 

DO 'Compression function .pre/* 

ELSE 

DO •Normal subroutine 1" 
END IF 

? 'Other en2ymes! , 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR* LENGTH / IN2T, I, CCMNJEN 
IF CONSENT 

DO 'Compression function. pre;" 
ELSE 

DO 'Normal subroutine 1" 

ENDIF 

♦EJECT 

? • MISCELLANEOUS CATEGORIES' 

? 

? • stress ' response ; ' 

SORT ON ENTRY , NUMBER FIELDS RFEND, NUMBER, L, D, F, Z , R, C , ENTRY, S , DESCRIPTOR, LENGTH, INIT, I, COMMEN 
IF C0ND2N-1 

DO 'Compression function.prg" 
ELSE 

DO 'Normal subroutine 1" 
ENDIF 

? 'Structural: ' 

SORT ON ENTRY , NUMBER FIELDS RFEND, NUMBER, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, COMMEN ■ 
IF COMDEN=l 

DO 'Compression function. prg M 
'ELSE 

DO "Normal subroutine 1" 
ENDIF 

*? 1 Other clones! 1 

SORT On" ENTRY t NUMBER FIELDS ' RFEND, NUMBER, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, COMMEN 
IF CONDEN=:l 

DO "Compression function. prg M 
ELSE 
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DO "Norral subroutine 1 M 
ENDIF 

? 'Clones of unknown funccion:' 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, F, Z * R, C. ENTRY, S, DESCRIPTOR, LHM3TK, I, CttMMEN 

IF CCGtfDEN=l 

DO "Compression function. prg" 
ELSE 

do "Normal subroutine 1 N 
END IF 

IF C0NDEN*1 
EJECT 

*SET DEVICE TO PRINTER 

*SET PRINT ON 

DO 'Output heading .prg" 
» *«* 

USE "Analysis function . dbf " 

DO "Create bargraph .prg M 

SET HEADING OFF 
*#★ 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,452 PIXELS FONT " Geneva \ 12 COLOR 0,0,0 

•? - TOTAL TOTAL NEW DIST 

? ' FUNCTIONAL CLASS CLONES GENES GENES FUNCTIONAL CLASS 1 

r> » 

*L2ST OF? FIELDS P , NAME, CLONES , GENES , NEW, PERCSHT , GRAPH , COMPANY 
LIST OFF FIELDS P , NAME , GLOMES , GFNES , NEW , PERCENT , GRAPH 
CLOSE DATABASES 
ERASE TEMP2.DSF 
SET HEADING ON 

*USE «SrrartGuy:Fo*BASE+/Maci fox files tTEMPMASTER.dbf* 
ENDIF 

CASE ANAL»8 

DO "Subgrcuo sumrraxy 3.prg" 
ENDCASE 

DO "Test print .prg" 

SET PRINT OFF 

SET DEVICE TO SCREEN 

CLOSE DATABASES 

* ERASE TEMPLXS ♦ DBF 

* ERASE TEMPNUM.DBF 

* ERASE TEMPDESIG , DBF 

* ERASE SELECTED. DBF 
CLEAR 

LCOP 
EMDCO 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USE TEMPI 
COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARK1 = 1 

SW2=0 

DO WHILE SW2 = 0 ROLL 
IF MARK! >« TOT 
?ACX 

COUNT TO UNIQUE 

COUNT TO NEWGENE3 FOR ' H 1 . OR . D= 1 0 1 

SW2=1 

LOO? 

ENSIF 
GO MARK! 
DUP s 1 

STORE ENTRY TO TESTA 
SW * 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO TEST3 

IF TESTA = TESTS 

DELETE 

DUP = DUP-rl 

LOOP 

ENDIF 
GO MARK!. 

REPLACE RFEND WITH DUP 
MARK I * MARXl+DUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
•GO TOP 

STOKE Z TO LOC 

USE 'Analysis location. dbf" 

LOCATE FOR Z*LOC 

REPLACE CLONES WITH TOT 

REPLACE C-ENES WITH UNIQUE 

REPLACE NEW WITH NEWGENES 

USE TEMPI 

SORT ON RFEND'/ D TO TEMP2 
USE TEKP2 

?? STR(UNIQUE,5,0) 

?? 1 genes, for a total of * 
?? STO(TOT,5,0) 
?? ' .clones' 

? • v Coincidence' 

list off fields number, RFEOT,L,D,F,Z,R,C^ 

*3ET PRINT OFF 
CLOSE DATA3ASES 
ERASE TEMPI. DBF 
ERASE TEtf?2>DBF 
USE TEMPDESIG 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USE TEMPI 
COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARK1 = 1 

SW2-0 

DO WHILE SW2=0 ROLL 
I? MARK! >= TOT 
PACK 

COUNT TO UNIQUE 
SW2=1 
LOOP 
END IT 
GO MARK I 

dup = : 

STORE ENTRY TO TESTA 
SW b 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO TESTE 

IF TESTA - TESTB 

DELETE 

DUP = DUP+l 

LOOP 
■ END IF 
GO MARK1 

REPLACE RFEND WITH DUP 
tfARKl a MARK1+OT? 

LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
* BROWSE 

•*SET PRINTER ON 

SORT ON DATE TO TEMP2 

USE TEMP2 

?? STR (UNIQUE, 4,0) 

11 1 genes, for a total of 

77 STR(TOT,4,0) 

77 •• clones' 

? 

1 i V Coincidence 1 

COUNT TO P4 FOR 1*4 

IF P4>0 

? STR(P4,3,0) 

?? ' genes with priority « 4 (Secondary analysis:)' . 

list Off fields number , RFEND, L,D,F,Z ,R,C, EOTRY, S, DESCRIPTOR, LENGTH, INIT for 1*4 

7 

SNDI? 

COUNT TO P3 FOR 1*3 

IF P3>0 

? STR { Pi 4 3 , 0 ) 

?? '* genes' with priority = 3 (Full insert sequence:)' 

list off fields number, RFEra,L,D,F,Z,R,C,£m^Y,^^ for 1=3 

ENDIF 

COUNT TO P2 FOR 1=2. 
IF P2>0 

? STR(P2,3,0) , ^ A , ta v , 

?? ' genes with priority » 2 (Primary analysis complete:) 

list Off fields number , RTEND ,L.D,F,Z,R,C, ENTRY , £ , DESCRIPTOR , LENGTH , INIT for 1=2 
•> 

t 

END IF 

COUNT TO PI FOR 1=1 
IF P1>0 
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? STR(P1,3,0) 

?? 1 genes with priority = 1 (Prii^ry analysis neededs}' 

list off fields nunber , RFEND ,L,D,F,z,R,C, EOTRY, 5 , DESCftr ?TOR , LENGTH , INIT for 1=1 
EMDIF 



•SET PRINT OFF 
CLOSE DATABASES 
ERAS 2 TEMPI, DBF 
ERASE TEHP2.DBF 

USE "SmarcGuyiFoxBASS^/Macrfox filesi clones. dbf 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USE TEMPI 

COUNT TO TOT 

REPLACE ALL FT END WITH 1 

MASK1 = 1 

SW2=0 

DO WHILE SW2*0 ROLL 
IF MARK1 >= TOT 
PACK 

COUNT TO UNIQUE 

SW2 = 1 

LOOP 

ENDIF 
GO MARK1 
DUP - 1 

STORE ENTRY TO TESTA 
SW = 0 

DO WHILE SW«0 TEST 
SKIP 

STORS ENTRY TO TESTS 

IF TESTA = TESTB 

DELETE 

DUP - DUP+1 

LOOP 

ENDIF 
GO MARKl 

REPLACE RFEND WITH DUP 
MARK1 = MARKl-fDUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 

* BROWSE 

*S2T PRINTER ON 

SORT ON NUMBER TO TEKP2 

USE TEMP2 

?? STR (UNIQUE, 4,0) 

?? ' genes, for a total c£ 1 

?? STR(TOT,5,0) 

?? ' clones' 

? * V Coincidence' 

list off fields nuatoer,HSm),fc.D.F,Z,R,C,^ 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI .DBF 
ERASE TEMP2 .DBF 

USE "SmartGuyiFoxBAv3Er/Mac:fox f iles: clones. dbf 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USE TEMPI 

COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARK! = "x 

SW2=0 

DO WHILE SW2-0 ROLL 
IP MARK1 >« TOT 
PACK 

COUNT TO UNIQUE 

COUNT TO NEWGENE3 FOR D= ' H 1 , OR. D* '0 ' 

SW2*1 

LOOP 

ENDIF 
GO MARK1 
DUP - 1 

STORE ENTRY" TO TESTA 
SW s 0 

DO WHILE SW=0 TEST 
SKIP 

STORE E^TTRV TO TEST3 

IF TESTA = TESTS 

DELETE 

DUP si DUF+1 

LOOP 

ENDIF 
GO'mARKI" 

REPLACE RFEND WITH DUP 
MARK1 - KARKl+EUP 
SW*1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
GO TOP 

STORE R TO FUNC 
USE "Analysis function . db£ " 
LOCATE FOR P=FUNC 
'REPLACE CLONES WITH TOT 
REPLACE GENES WITH UNIQUE 
REPLACE NEW WITH NEWGENE3- 
USE TEMPI 

SORT CM RFEND/ D TO TEMP 2 

USE TEMP2 

SET HEADING ON 

?? STR {UNIQUE, 5,0) 

?? 1 genes, for a total of ' 

?? STRflOT,5,0) 

?? ' clones' 

*** 

? 1 V Coincidence' 

list off fields number / RFEND, L, D, F, Z, R, C , ENTRY, S, DESCRIPTOR, LENGTH, INIT, I 
*** 

* SCREEN 1 TYPE 0 HEADING ■ Screen 1- AT 40,2 SIZE 266,492 PIXELS FONT "Geneva" , 13 COLOR 0,0, 
*liSt Cff fields RFEND , S , DESCRIPTOR 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEt-CPl .DBF 
ERASE TEt<r2.DEF 
USE TEMPDESIG 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USB TEMPI 
COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARK1 - 1 

SW2-0 

DO WHILE SW2*0 ROLL 
IF MARKl >= TOT 
PACK 

COUNT TO UNIQUE 

SW2=1 

LOOP 

ENDIF 
GO MARKl 
DUP - 1 

STORE ENTRY TO TESTA 
SW « 0 

DO WHILE 5W=0 TEST 
SKIP 

STORE ENTRY TO TSSTB 

IF TESTA « TSSTB 

UELETE 

DUP - EOP+1 

LOOP 
END XF 

GO MARKl 

REPLACE BFEND WITH DUP 
MARKl = MARKl+DUP 
SW*1 
LOOP 

ENDDO TEST 
LOOP 

ENCDO ROLL 
GO TOP 

STORE F TO DIST 

USE "Analysis distribution, dbf" 
LOCATE FOR PrDIST 
REPLACE CLONES WITH TOT 
REPLACE GENES WITH UNIQUE 
USE TEMPI 

sort on. r£<snd/d to TEMP2 

USE TTHP2 

?? STR (UNIQUE, 5, 0) 

?? 1 genes, for a total of 1 

?? STR(TOT,5,0> 

7? » clones ' 

? • v Coincidence* 

list off fields nurnber,RFEND, L,D, F, Z,R,C, ENTRY, S , DESCRIPTOR, LENGTH, INIT, I 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI. DBF 
ERASE TEMF2.DSF 
USE TEMPDESIG 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 

USB TEMPI 

COUNT TO TOT 

REPLACE ALL RFEND WITH I 

MARK1 = 1 

SW2«0 

DO V7HILE SW2=0 ROLL 
IF MARKl >« TOT 
PACK 

COUNT TO UNIQUE 

LOOP 

ENDI? 
GO KARKl 
DUP * 1 

STORE ENTRY TO TESTA 
SW * 0 

DO WHILE SW«0 TEST 
SKIP 

STOF,E ENTRY TO TESTS 

IF TESTA s TESTB 

DELETE 

DUP .= DUP+1 

LOOP 

ENDIF 
GO MARK1 

REPLACE -RFEND WITrJ DUP 
MARK1 m MARKl+DUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL ' 

GO TO? 

USE TEMPI 

?? STR (UNIQUE, 5,0) 

7? 1 g«nes ( for a total of ' 

?? STR(TCT,5,0) 

?? ' clones' 

? 1 V Coincidence* 

list Off fields number, RFEND, L, D, F, Z, R, C , ENTRY, S, DESCRIPTOR, LENGTH, INIT, I 

*SET FRIOT OFF 
CLOSE DATABASES 
ERASE TEMPI. DBF 
USE TEMFDESIG 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USE "Stor-Guy:FojcBASS^/Mac: :o>: files: Clones. dbf* 
COPy TO TEMPI FOR 
USE TEMPI 

COUNT TO I DO ENS FOR D* 1 E * , OR. D= ' O ' .OR. D= 'H ' .OR .D= ' N ' . OR. 1> ' R ' . OR. 1 A 1 

DELETE FOR D= *N 1 . OR. D= • D 1 . OR . D= • A 1 .OR.Dr'U' .OR.D='S' , OR. D" ' M * . OR. D« ' R ' .OR. Da 'V 1 

PACK 

COUNT TO TOT 

REPLACE ALL RFZND WITH 1 

MARX1 = 1 

SW2=rQ 

DO WHILE SW2=0 ROLL 
I? MARK1 >= TOT 
PACK 

COUNT TO UNIQUE 

LOOP 

ENDIF 
GO MARXl 
DUP = 1 

STORE ENTRY TO TESTA 
SW * 0 

DO WHILE SW=C TEST 
SKIP 

STORE ENTRY TO TESTE 

IF TESTA = TESTS 

DELETE' 

DUP = DUP+1 

LOOP 

EHDIF 
GO MARKl 

REPLACE RFEND WITH DUP 
MARX1 - MARXl + DUP 

LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
* BROWSE 

*SET PRINTER ON 

SORT ON RFEMVD, NUMBER TO TEMP2 
USE TEMP2 

REPLACE ALL START WITH RTEND/ICCEME*10000 

?? STR (UNIQUE; 5,0) 

?? • genes, for a total of ' 

?? STR(TOT,5,0) 

?? * clones' 

? 1 Coincidence V V Clones/10000 1 

set beading off 

SCREEN 1 TYPE 0 HEADING "Screen V AT 40,2 SIZE 286,492 PIXELS FONT "Geneva 1 #7 COLOR 0,0/0, 

list fields nuinber ( RFEkT), START, L,D,F,Z,R,C,E2^ 

+SET FRUJT OFF 

CLOSE DATABASES 

ERASE TEMPI. DBF 

ERASE TEMP2 . DBF 

USE "SmartGuy :FoxBASE+/Mac :fox f ilea: clones. dbf" 
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* COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USE TEMPI 

COUNT TO IDGENE FOR D= 1 E F .OR, D= ' 0 ' .OR.D* 'H ' .OR .E« *N • . OR. D= * R' .OR. Ds * A' 

DELETE FOR D= *N' . OR. Ds ' D ' . OR. D= ' A ' . OR . D= ' U \OR .D= ' S 1 . OR . 1 M' . OR. D= ' R* . OR. D» ' V* 

FACK 

COUNT TO TOT 
$ REPLACE ALL RFEMD WITH 1 

4 MARK! » 1 

£W2=:0 

DO WHILE SW2=0 ROLL 
IF MARK1 >= TOT 
PACK 

COUNT to unique: 

SW2=1 

ENDIF 
GO MARX1 
DUP = 1 

STORE ENTRY TO TESTA 
SW « 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO TESTE 
IF TESTA * TESTS 
,^ DELETE 
¥ DUP - DUP+1 

LOOP - 

ENDIF 
GO MARX1 

REPLACE SPEND WITH DUP 
MARK1 » MARX 1-r DUP 
SW=1 
LOOP 

END DO TEST 
LOOP 

END DO ROLL 
*BROWSE 

*SET PRINTER ON 

SORT ON RPEND/D, NUMBER TO TEMP 2 
USE TEMP2 

REPLACE ALL START WITH RFEND/ IDGENE* 10000 

?? STO (UNIQUE, 5,0) 

?? 1 genes, for a tdc&l of ' 

?? STR(TOT,5,0) 

?? ' clones' 

? • Coincidence V V Clones/20000' 

set heading off 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 4C,2 SIZE 286,492 PIXELS FONT "Geneva 1 , 7 COLOR 0,0,0, 

list fields number, RFEND, START, L,D,?, Z^R,C, ENTRY, S, DESCRIPTOR; INIT, I 

*SET PRINT OFF 

CLOSE DATA3ASES 

ERASE TEMPI. DBF 

ERASE TEMP2 . DBF 

USE "SmartGuyiFoxEASE+ZMac! fox f iles : clones .dbf" 
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USE TEMPI 
COUNT TO TOT 

?? • Total of 

?? STR(TOT,4,0) 
?? 1 clones' 
? 

nist off fields number, L, D, F, z , r, c, entry, descriptor, length, rfznd, init, x 
list: off fields number ,L,D,F,Z,R,C, ENTRY , DESCRIPTOR 
CLOSE DATABASES 
ERASE- TEMPI. DBF 
USE T2KPDSSIG 
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*Lifescan menu; version 8-7-94 

SETT TALK OFF 

sec device to screen 

CLEAR 

USE "SirartGuy:FoxBA5E+/Macifox f iles : clones .dbf ' 
STORE LUFDATE C ) TO Update 
GO BOTTOM 

STORE RECNO ( ) TO cloneno 
STORE 6 TO Chooser 
DO WHILE .T. 

* Prcgrcnw * Lifeseq menu. font: 

* Date, . . . t 1/11/95 

* Version.: FoxEASE+/Mac, revision 1.10 

* Notes. . . . ; Ferns t file Lifesaq menu 
* 

SCREEN* 1 TYPE 0 HEADING - Screen 1" AT 40,2 SIZE 286,492 PIXELS FCNT "Geneva* , 268 COLOR 0,6, 
d PIXELS 18,126 TO 77,365 STYLE 2S479 COLOR 32767 , -25600 , ~1, -16223, -16721, -15725 
3 PIXELS 110,29 TO 188,217 STYLE 3871 COLOR 0,0,-1,-25600,-1,-1 

£ PIXELS 45,161 SAY "LIFESEQ* STYLE 55536 FONT 'Geneva' , 536 COLOR 0,0,-1,-1,7135,5884 

£ PIXELS 36,269 SAY "TM rt STYLE 65536 FONT •Geneva*, 12 COLOR 0,0,-1,-1,7135,5884 

(3 PIXELS 63,143 SAY "Molecular Biology Desktop * STYLE 65536 FCNT "Helvetica*, 18 COLOR 0,0,0, 

0 PIXELS 90,252 TO 251,467 STYLE 28447 COLOR 0,0,-1,-25600,-1,-1 

3 PIXELS 117,270 GET Chooser STYLE 65536 FONT 'Chicago', 12 PICTURE "<3*RV Transcript profiles 
« PIXELS 135,128 SAY Update STYLE 0 FOOT •Geneva 1 , 12 SIZE 15,79 COLOR 0,0,0,-25600,-1,-1 " 
(5 PIXELS 171,128 SAY cloneno STYLE 0 FONT "Geneva*, 12 SIZE 15,79 COLOR 0,0,0,-25600,-1,-1 
$ PIXELS 135,44 SAY "Last update:" STYLE 65536 FOOT "Geneva" ,12 COLOR 0,0,-1,-1,-1,-1 
Q PIXELS 171,44 SAY "Total Clon$Si" STYLE 65536 FGKT "Geneva M2 COLOR 0,0,-1,-1,-1,-1 
0 PIXELS 43,296 SAY *vl.30" STYLE 65536 FOOT 'Geneva 1 , 782 COLOR 0,0,-1,-1,-1,-1 

* EOF: Lifeseq menu.fmt 
READ 

DO CASE 

CASH Choosers: 1 

DO "emartGuyiFox3ASE+/Mact£ox files : Output programs i Master analysis 3.prg" 
CASS Chooser=2 

DO "SmartGuyiFox3ASE+/Mac:fox files -.Output program : Subtraction 2 .prg" 
' CASE Chooser =3 

DO "S^artGuyjFoxSASE+/Mac:fox fileaiOutput progxains Northern (single) .prg" 

CASE Chooser =4 

USE -Libraries. dbf - 

BROWSE 

case Chooser* 5 

DO "'SroartGuy ;FoxEASE+/Mac: fox files: Output programs t see individual clone. prg" 
CASE chooser— 6 

DO •SroartGuy:FoxBASE+/Mac:fox files ; Libraries i Output programs :Menu .prg" 

CASE Choosar=7 

CLEAR 

SCREEN 1 OFF 
RETURN 

ENDCASE 

LOOP 
ENDDO 
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61,30 SAY -Database Subset: Analysis" STYLE 65536 FONT "Geneva" ,274 COLOR 0,0,0,-1,-1,-1 

? 

7 

7 

7 

7 d&teO 

11 TIMSO 

? 'Clone numbers • 

7? STR (INITIATE, 6,0) 

?? ' through ' 

?? STR (TERMINATE, 6,0} 

? 'Librarisr ' ' 

IF ENTIRE* 1 

? 'All libraries' 

ENDIF 

IF ENTIRE- 2 
MAHKsl 
DO WHILE .T, 
IF KAKK>STO?IT 
EXIT 
ENDIF 

USE SELECTED 

GO MARK 
*> 1 » 

7? TRIM(libneina) 
STORE MARK+1 TO MARK 
LOOP 
ENDDO 
EMDIF 

? 'Designations i 1 

IF Eraatch~0 .AND, Hmatch=Q .AND. Onatch«0 

?? 'All' 

ENDIF 

IF Emtch-1 
77 'Exact, 1 
ENDIF 

IF Hioatch-1 
?? 'Human, ' 
END IF 

IF Gmatch*! 
77 'Other sp. ' 
ENDIF 

IF CCNDENd 

? 'Condensed format analysis' 

SNDIF 

IF ANAL=1 

7' 'Sorted by NUMBER ' 

ENDIF 

IF ANAL -2 

? 'Sorted by EMTRY ' 

ENDIF 

IF ANAL* 3 

7 'Arranged by ABUNDANCE' 

ENDIF 

IF ANAL=4 

? 'Sorted by INTEREST 

ENDIF 

IF ANAL=5 

7 'Arranged by LOCATION ' 

ENDIF 

IF ANAL- 5 

7 'Arranged by DISTRIBUTION 1 

ENDIF 

IF ANAL=7 

? * Arranged by FUNCTION 1 
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ENDXK 

? '"Total clones represented; 
?? $TR ( STARTOT, 6,0) 
? 'Total clones analyzed! 1 
?? STR(AJtf?iLTOT,6, 0) 

? 
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USE TEMPI 

COUNT TO TOT 

?? ■ Total of 

?? STR(TOT,4,0) 

?? ' clones* 
t 

*list O'ff fields number, L,D,F, Z,R,C, ENTRY, DESCRIPTOR, LE^TH, RFHKD, INIT, I 
liat off fields number ,1, D, F, Z, R, C, entry, DESCRIPTOR 
CLOSE DATABASES 
ERASE TEMPI. D3? 
USE TEMPDESIG 
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USE TEMPI 
COUNT TO TOT 
?? ' Total of 
?? STR(TQT,4,0) 

?? 1 clones' 

*list off fields nuiru^r, L,D,F,Z,R ( C,E^y,DESC^ 

list off fields nunvber^^D^F^^jC/EirrRV.DZSCRirTOR 

CLOSE DATABASES 
ERASE TEMPI .DBF 

USE TEMPDSSXG 
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*Northern (single) , version 11-25-94 

close databases 

SET TALK OPT 

SET PRINT OFF 

SET EXACT OFF 

CLEAR 

STORE 1 ' TO Eobject 

STORE 1 ' TO Dobject 

STORE 0 TO Numb 
STORE 0 TO Zog 
STORE 1 TO Bail 
DO WHILE . V. 

* Program. . : Ncr them ( s ing i e ) . fmt 

* Date : 3/ 3/94 

* Version.: Fox&^SE*/Mac, revision 1.10 

* Notes : Format file Northern (single) 

* 

SCREEN 1 TYPE 0 HEADING "Screen 1* AT 40,2 SIZE 286,492 PIXELS FONT "Geneva", 12 COLOR 0,0,0 
8 PIXELS 15, Si TO 45,397 STYLE 28447 COLOR 0,0,-1,-25600,-1,-1 
0 PIXELS 89,75 TO 192,422 STYLE 28447 COLOR 0,0,0,-25600,-1,-1 

<3 PIXEL3 115, S8 SAY "Entry STYLE €5336 FONT "Geneva" ,12 COLOR 0,0,0,-1,-1,-1 

@ PIXELS 115,173 GET Eobject STOLE 0 FONT "Geneva - , 12 SIZE 15,142 COLOR 0,0,0,-1,-1,-1 

@ PIXELS 145,89 SAY "Description" STYLE 65536 FOOT "Geneva", 12 COLOR 0,0,0,-1,-1,-1 

<a PIXELS 145,173 GET Dobject STYLE 0 FONT •Geneva", 12 SIZE 15,241 COLOR 0,0,0,-1,-1,-1 

<a PIXELS 35,89 SAY "Single Northern search screen" STYLE 65536 FONT • Geneva \ 274 COLOR 0,0,- 

@ PIXELS 220,162 GET Bail STYLE 65536 FONT "Chicago', 12 PICTURE *3*R Continue; Bail oaf S±ZE 

3 PIXELS 175,93 SAY "Clone STYLE 65536 FONT "Geneva";12 COLOR 0,0,0,-1,-1,-1 

© PIXELS 175,173 GST Numb STYLE 0 FONT 11 Geneva ", 12 SIZE 15,70 COLOR 0,0,0,-1,-1,-1 

@ PIXELS 80,152 SAY "Enter any ONE of the following;:" STYLE 65536 FOOT "Geneva", 12 COLOR -1, 

* 

* EOF: Northern (single) . fmt 

READ 

IF Bail-2 
CLEAR 

screen 1 off 

RETURN 

ENDIF 

USE *SirartGuy:FoxBASE+/Mac!Fox files : Lookup. dbf 11 
SET TALK 'ON 

IF Eobjecto' 

STORE UPPER (Eobject) to Eobject 

SETT SAFETY OFF 

SORT ON Entry TO "Lookup entry. dbf* 

SET SAFETY ON 

USE "Lookup entry, dbf" 

LOCATE FOR Look* Eobject 

IF ,NOT,FCUND<) 

CLEAR 

LCOP 

ENDIF 

BROWSE 

STORE Entry TO Searchval 

CLOSE DATABASES 

ERASE "Lookup 'entry. dbf " 

ENDIF 

IF Dobjecto' 
SET EXACT OFF 
SET SAFETY OFF 

SORT ON descriptor TO "Lookup descriptor, dbf * 

SET SAFETY On 

USE "Lookup descriptor * dbf M 

LOCATE FOR UPPER (TRIM (descriptor) ) =U??ER (TRIM (Dobject) ) 

IF .NOT. FOUND () 

CLEAR 
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LOOP 

ENDIF 

BROWSE 

STORE Entry TO Searchval 

CLOSE DATABASES 

ERASE "Loolcup descriptor .dbf " 

SET EXACT ON 

ENDIF 

IT NimboO 

USE ■ SmartGuy : FoxBASE* /Mac : Fox files : clones . dbf # 

GO Numb 

BROWSE 

STORE Entry TO Searchval 
ENDIF 

CLEAR 

? 'Northern analysis £cr entry ' 

?? Searchval 
■> 

* 

? 'Enter Y to proceed' 

WAIT TO OK 

CLEAR 

IP UPPER (OK) o'Y* 
screen 1 off 
RETURN 
ENDIF 

* COMPRESSION SUBROUTINE FOR Library, dbf 
? 'Compressing the Libraries file now. , * ' 

USE "SmartGuy :?oxBASE+/Kac iFox files: libraries . dbf 
SET SAFETY OFF 

SORT ON library TO "Corspressed libraries. dbf 

* FOR enter ed>0 

SET SAFETY ON 

USE "Compressed libraries .dbf * 

DELETE FOR entered- 0 

PACK 

COUNT TO TOT* 
MARK1 s 1 
SW2*0 

DO WHILE SW2=0 ROLL 

IF MARX1 >= TOT 

PACK 

SW2=1 

LOOP 

ENDIF 
GO MARK- 
STORE library TO TEs-** 
SKIP 

STORE Library TO TESTB 
IF TESTA » TESTB 
DELETE 
ENDIF 

MARX1 * MARK1+1 
LOOP 

ENDDO ROLL 

* Northern analysis 
CLEAR 

? 'Doing th» northern now. . . ' 
SET TALK ON 

USE * SmartGuy ;FoxHASEWMae: Fox f ilee : clones . dbf* 
SET SAFETY OFF 

COPY TO "Hit s, dbf FOR entry=searchval 
SET SAFETY ON 



84 



WO 95/20681 



PCT/US95/01160 



CLOSE DATABASES 
SELECT 1 

USE "Coirpressed libraries. dbf " 

STORE RSCCCONTO TO Entries 

SELECT 2 

USE "Kits.abf 1 ' 

Market 

DO WHILE .T. 

SELECT i 

IF MarJoEnfcries 

EXIT 

ENDXF 
GO MARK 

STORE library TO Jigger 
SELECT 2 

COUNT TO Zcg FOR litorary= Jigger 
SELECT 1 

REPLACE hits with Zog 

Mark=Mark+l 

LOOP 

ENDDO 



SELECT 1 „ n 

BROWSE FIELDS LIBRARY, L I BNAME, ENTERED, HITS AT 0,0 

CLEAR 

? 'Enter Y to print: ' 

WAIT TO FRINSET 

IF UPPER < PRINSET ) - 1 Y ' 

SET PRINT ON 

CLEAR 

SCREEN 1 TYPE 0 HEADING -Screen 1' AT 40,2 SI2E 286,492 P2XELS FOOT -Geneva ",14 COLOR 0,0,0 
? 'DATABASE ENTRIES MATCHING EOTRY ' 
?? Searchvel 
? DATE ( ) 

SCREEN 1 TYPE 0 HEADING "Screen 1* AT 40,2 SIZE 266,492 PIXELS FONT "Geneva" ,7 COLOR 0,0,0, 
LIST OFF FIELDS library , libnarr.e, entered, hits 

* 
* 

SELECT 2 

LIST OFF FIELDS NUMBER, LIB RARY, D, S Z, R, EtfTRY, DESCRIPTOR ,R7 START, START, HFEND 

SET TALK OFF 
SET PRINT OFF 
ET3DIF 

CLOSE DATABASES 
SET TALK OFF 
CLEAR 

DO "Test print. prg" 
RETURN 
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TABLE 6 



^4 



library 

ADSNINB01 

ADRENOR01 

A0RHNO701 

AMLBNOT01 

BMARNOTQ1 

BMARNOT02 

CAKDNOT01 

CHAONOTC1 

CORNNOTD1 

FiaRAOTOI 

F1BRAGTC2 

RERANT01 

F1ERNGT01 

RSRSQTC2 

FIERNOTOI 

RB3NOTQ2 

HMC1WOT01 

HUVH-PB01 

HUVENO801 

HUVESTB01 

HYFONOS01 

KIDNNOT01 

UVRNOT01 

LUNGNOT01 

MUSCMOT01 

OV1DMOS01 

PANCNOTOl 

PmJNOROI 

prruNOTOi 

PLACNOB01 

SINTNOT02 

SPLNFET01 

SPLNNOT02 

STOMNOTOt 

SYNOaABOl 

TBLYNOTD1 

THSTNOTOI 

THP1NOB01 

THP1PEB01 

THP1PLB01 

U937MOT01 



fibname 
Inflamed adenoid 
Adrenal gland (0 
Adranat gland (T) 
AMI blast cells (T) 
Bon© marrow 
Bona marrow (T) 
Cardiac muscle (T) 
Chin, hamstar ovary 
Corneal stroma 
Fibroblast, AT 5 
Fibroblast, AT 30 
Fibroblast, AT 
Fibroblast, uv 5 
Fibroblast, uv 30 
Fioroblast 
Fibroblast, normal 
Mast cell Una HMC-1 
HUVECIFN.TNF.LFS 
HUVEC control 
HUVEC shear stress 
Hypothalamus 
Kidney (T) 
LJver rn 
Lung (T) 

Skeletal muscle (T) 
Oviduct 

Pancreas, normal 
Pituitary (r) 
Pituitary (T) 
Placenta 

Small intestine (T) 
Spleenrliver, fatal 
Spleen (T) 
Stomach 

Rheum, synovium 
T B rymphoblast 
Te9ti9 (T) 
THP-l control 
THP phorbol 
THP*1 phorbol LPS 
U937, monocytic lauk 



■as?* 



number library 

2304 U937KIOT01 

3340 HMC1NOT01 

32S9 HMC1NOT01 

<W93 HWC1NOT01 

aaag hmcinotoi 

9139 HMC1NOT01 



d s f z r entry 
EHCCT HUMET1B 
E HCCT HUMEF1B 
EHCCT HUWHFlB 
E K C 0 T HUMEF1B 
EHCCT HUMEF1B 
EHCCT HUWEF1B 



descriptor 
Elongation larior 1-b«ta 
Elongation (actor 1-fodta 
Elongation factor 1-beta 
Elongation factor vbeta 
Elongation factor i-beta 
Elongation factor i-bdta 



rf a t a rtat a m 


rf en d 


o- 0 


773 


0 370 


773 


0 371 


773 


0 470 


773 


0 327 


773 


0 375 


773 
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WHAT IS CLAIMED IS: 

1. A method of analyzing a specimen containing gene 
transcripts, said method comprising the steps of: 

(a) producing a library of biological sequences; 
5 (b) generating a set of transcript sequences, where 

each of the transcript sequences in said set is indicative 
of a different one of the biological sequences of the 
library; 

(c) processing the transcript sequences in a 

10 programmed computer in which a database of reference 

transcript sequences indicative of reference biological 
sequences is stored, to generate an identified sequence 
value for each of the transcript sequences, where each said 
identified sequence value is indicative of a sequence 

15 annotation and a degree of match between one of the 

transcript sequences and at least one of the reference 
transcript sequences; and 

(d) processing each said identified sequence value to 
generate final data values indicative of a number of times 

20 each identified sequence value is present in the library, 

2. The method of claim 1, wherein step (a) includes 

the steps of: 

obtaining a mixture of mRNA; 

making cDNA copies of the mRNA; 
25 isolating a representative population of clones 

transfected with -he cDNA and producing therefrom the 
library of biological sequences. 

3. The method of claim 1, wherein the biological 
sequences are cDNA sequences. 

30 4. The method of claim 1, wherein the biological 

sequences are RNA sequences. 

5. The method of claim 1, wherein the biological 
sequences are protein sequences. 
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6. The method of claim 1, wherein a first value of 
said degree of match is indicative of an exact match, and a 
second value of said degree of match is indicative of a 
non-exact match. 

5 7. A method of comparing two specimens containing 

gene transcripts, said method comprising: 

(a) analyzing a first specimen according to the 
method of claim 1; 

(b) producing a second library of biological 
10 sequences; 

(c) generating a second set of transcript sequences, 
where each of the transcript sequences in said second set 
is indicative of a different one of the biological 
sequences of the second library; 

15 (d) processing the second set of transcript sequences 

in said programmed computer to generate a second set of 
identified sequence values known as further identified 
sequence values, where each of the further identified 
sequence values is indicative of a sequence annotation and 

20 a degree of match between one of the biological sequences 
of the second library and at least one of the reference 
sequences ; 

(e) processing each said further identified sequence 
value to generate further final data values indicative of a 

25 number of times each further identified sequence value is 
present in the second library; and 

(f) processing the final data values from the first 
specimen and the further identified sequence values from 
the second specimen to generate ratios of transcript 

30 sequences, e^ch of said ratio values indicative of 

differences in numbers of gene transcripts between the two 
specimens • 

8, A method of quantifying relative abundance of mRNA 
in a biological specimen, said method comprising the steps 
3 5 of: 

(a) isolating a population of mRNA transcripts from 
the biological specimen; 
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(b) identifying genes from which the mRNA was 
transcribed by a sequence-specific method; 

(c) determining numbers of mRNA transcripts 
corresponding to each of the genes; and 

5 (d) using the mRNA transcript numbers to determine 

the relative abundance of mRNA transcripts within the 
population of mRNA transcripts. 

9. A diagnostic method which comprises producing a 
gene transcript image, said method comprising the steps of: 

10 (a) isolating a population of mRNA transcripts from a 

biological specimen; 

(b) identifying genes from which the mRNA was 
transcribed by a sequence-specific method; 

(c) determining numbers of mRNA transcripts 
15 corresponding to each of the genes; and 

(d) using the mRNA transcript numbers to determine 
the relative abundance of mRNA transcripts within the 
population of mRNA transcripts, where data determining the 
relative abundance values of mRNA transcripts is the gene 

20 transcript image of the biological specimen. 

10. The method of claim 9, further comprising: 

(e) providing a set of standard normal and diseased 
gene transcript images; and 

(f) comparing the gene transcript image of the 

25 biological specimen with the gene transcript images of step 
(e) to identify at least one of the standard gene 
transcript images which most closely approximate the gene 
transcript image of the biological specimen. 

11. The method of claim 9, wherein the biological 
30 specimen is biopsy tissue, sputum, blood or urine. 

12. A method of producing a gene transcript image, 
said method comprising the steps of 

(a) obtaining a mixture of mRNA; 

(b) making cDNA copies of the mRNA; 
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(c) inserting the cDNA into a suitable vector and 
using said vector to trans feet suitable host strain cells 
which are plated out and permitted to grow into clones, 
each clone representing a unique mRNA; 
5 (d) isolating a representative population of 

recombinant clones ; 

(e) identifying amplified cDNAs from each clone in 
the population by a sequence-specific method which 
identifies gene from which the unique mRNA was transcribed; 
10 (f) determining a number of times each gene is 

represented within the population of clones as an 
indication of relative abundance; and 

(g) listing the genes and their relative abundance in 
order of abundance, thereby producing the gene transcript 
15 image. 



13. The method of claim 12, also including the step 
of diagnosing disease by: 

repeating steps (a) through (g) on biological 
specimens from random sample of normal and diseased humans, 
20 encompassing a variety of diseases, to produce reference 
sets of normal and diseased gene transcript images; 

obtaining a test specimen from a human, and producing 
a test gene transcript image by performing steps (a) 
through (g) on said test specimen; 
25 comparing the test gene transcript image with the 

reference sets of gene transcript images; and 

identifying at least one of the reference gene 
transcript images which most closely approximates the test 
gene transcript image. 

30 14. A computer system for analyzing a library of 

biological sequences, said system including: 

means for receiving a set of transcript sequences, 
where each of the transcript sequences is indicative of a 
different one of the biological sequences of the library; 

3 5 and 

means for processing the transcript sequences in the 
computer system in which a database of reference transcript 
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sequences indicative of reference biological sequences is 
stored, wherein the computer is programmed with software 
for generating an identified sequence value for each of the 
transcript sequences, where each said identified sequence 
5 value is indicative of a sequence annotation and a degree 
of match between a different one of the biological 
sequences of the library and at least one of the reference 
transcript sequences, and for processing each said 
identified sequence value to generate final data values 
10 indicative of a number of times each identified sequence 
value is present in the library. 



15. The system of claim 14, also including: 
library generation means for producing the library of 
biological sequences and generating said set of transcript 
15 sequences from said library. 



16. The system of claim 15, wherein the library 

generation means includes: 

means for obtaining a mixture of mRNA; 

means for making cDNA copies of the mRNA; 

20 means for inserting the cDNA copies into cells and 

permitting the cells to grow into clones; 

means for isolating a representative population of the 

clones an:i producing therefrom the library of biological 
sequences . 
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